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The  improved  techniques  of  molecular  graphics  have  triggered  very  fast 
advances  in  molecular  modelling,  be  it  concerned  with  structures  or  behaviour. 
Jean-Louis  Rivail's  proposal  to  devote  our  international  meeting  to  molecular 
modelling  was  therefore  approved  practically  without  discussion.  The  way  our 
conference  went  proved  it  had  been  a  good  choice.  ^ 

May  I  recall  here  what  is  nearly  a  return  to  Genesis.  1  have  a  vivid  memory,  as 
a  student,  of  attending  a  symposium  on  "The  Chemical  Bond"  held  in  1948  in  Paris: 
Mulliken,  Pauling,  Raman,  Louis  de  Broglie,  Ketelaar,  Cabannes,  Coulson,  and  the 
bright  young  people  of  theoretical  chemistry  compared  views  that  -  fortunately  -  did 
not  always  converge.  This  symposium  is  now  practically  part  of  the  legend  of  theoreti¬ 
cal  chemistry,  at  a  time  when  many  experimentalist!  were  judging  it  an  intellectual 
game  of  dubious  interest:  "I'll  believe  in  these  computations  when  those  people  will  be 
able  to  predict  something,  not  only  -pf  demonstrating  what  is  already  known." 
Theoreticians  have  since  proved  themselves.  si 

They  were  fabulously  helped  by  tljo^Saily  improvement  in  computer  perfor¬ 
mance.  As  a  young  research  fellow  Iperfiember  being  shown,  in  the  research  lab.  of  a 
great  hardware  producer,  a  com^utgr  occupying  nearly  100  square  metres  of  floor,  and 
which  could  compute  "so  fasp'that  its  innumerable  errors  were  unimportant:  it  was 
sufficient  to  do  the  same  computation  100  times  and  plot  the  corresponding  Gaussian 
curve:  its  summit  gave  th^  right  value.  One  dreams  at  such  memories. 

At  the  same  time  w-ystal  structures  compelled  very  lengthy  work,  whereas 
nowadays  one  must  select,  because  of  its  interest,  a  structure  which  will  be  determi¬ 
ned  very  quickly.  Conversely  the  field  of  "chemical  computation"  is  still  shared  bet¬ 
ween  some  research  on  molecular  reactivity  that  involve  laborious  "number  crunching", 
while  the  field  of  molecular  modelling  offers  many  subtle  examples  of  software  which 
can  visualize  simply  enough  the  structures  and  behaviours  of  complex  molecules. 

Our  Scientific  Committee  -  d.Y.Lallemand,  R.Lavery,  J.P.Mornon,  G.Pepe,  3.L. 
Rivail,  B.Robinet,  E.Soulie  and  G.Vergoten  -  had  therefore  no  great  difficulty  in  bring¬ 
ing  together  a  brilliant  group  of  surveys  and  contributions.  They  covered  ail  the 
ground  from  general  methods  and  theory  to  computation  techniques.  The  heart  of  the 
conference  was  of  course  filled  with  specific  modelling  research:  catalysis  and  adsorp¬ 
tion,  classical  methods  as  applied  to  model  biosystems  and  pharmacology,  quantum 
models,  spectra,  kinetic  models.  In  its  biophysical  aspects  molecular  modelling  had  as 
expected  a  choice  place,  dealing  with  methods,  proteins  and  polypeptides,  DMAs,  joint 
modelling-NMR  studies. 


We  had  thought  equally  necessary  to  assess  fields  at  the  borderline  of  our  cen¬ 
tral  theme  but  essential  to  our  command  of  computing  advances:  hence  our  choice  of 
surveys  on  the  numerical  basis  of  optimization  and  the  "molecular  fallout"  of  research 


on  artificial  intelligence. 

And  since' technical  developments  are  crucial  we  held  a  panel  discussion  on  the 


evolution  of  computer  architecture  in  which  IBM,  Cray,  FPS  and  Convex  took  part. 


The  overall  presentation  of  the  subject,  along  with  the  remarks  of  great  hardware 
conceivers,  make  a^rief  but  rewarding  chapter. 


As  always  ■jtliis  conference  is  the  work  of  those  who  brought  a  wealth  of  novel 
results  along  with  a  series  of  brilliant  surveys.  In  this  respect  we  owe  special  thanks 
to  several  colleagues  whom  illness  or  circumstances  beyond  their  control  prevented 
from  attending  and  who  were  kind  enough  to  send  in  their  contributions:  W.F.van 
Gunsteren,  I.Vaisman,  T.Solmajer  and  O.Iordache.  We  are  also  pleasantly  indebted  to 
many  attendees,  since  we  seldom  had  such  lively  and  profitable  discussions.  Our  thanks 
to  ail  those  who  made  this  symposium  a  scientific  success. 


This  meeting  was  also  a  celebration,  that  of  the  role  played  by  Alberte  and 
Bernard  Pullman  in  the  developments  of  Ouantum  Chemistry  and  particularly  of  its 
applications  to  Biophysics.  They  were  pioneers  on  this  wonderfully  fertile 
ground,  and  science  owes  them  research  of  the  highest  order  along  with  the  formation 
of  many  pupils  who  follow  brilliantly  on  the  tracks  they  have  opened.  We  had  great 
pleasure  in  building  this  conference  as  an  homage  to  the  Pullmans,  an  homage  com¬ 
pleted  by  our  Society's  Council,  which  elected  them  as  Honorary  members. 

Finally  this  44th  International  meeting  of  Physical  Chemistry  received  help  and 
financial  support  from  many  organizations  and  firms  to  whom  we  owe  much.  First  of 
all  the  University  of  Nancy  I  and  Nancy  City  Council,  who  welcomed  us  along  with 
the  Lorraine  Regional  Council,  Nancy  Urban  District  Authority  and  the  Sciences  of 
Matter  Department  of  the  University. 

We  also  received  help  and  support  from  C.N.R.S,  secteur  Chimie,  Comissariat  a 
I'Energie  Atomique  (DLPC),  Direction  des  Recherches,  Etudes  et  Techniques,  European 
Office  of  Aerospace  Research  and  Development  (USAF),  Hotels  Concorde,  l.B.M. 
France,  Ministere  de  1'Education  Nationale  (Direction  des  relations  internationales), 
Ministere  de  la  Recherche  et  de  la  Technologie,  U.S.Army  European  Research  Office, 
U.S.Office  of  Naval  Research,  London  branch.  We  are  grateful  for  this  support,  which 
allowed,  among  others,  to  ensure  the  attendance  of  young  research  people  and  collea¬ 
gues  from  countries  where  research  budgets  are  limited. 


Clement  Troyanowsky 
Honorary  Secretary 
S.F.C.  /  Division  de  Chimie  physique 
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AVANT-PROPOS 

Les  progres  du  graphique  moleculaire  ont  entraine  des  avancees  tres  rapides  de 
la  modelisation  moleculaire,  qu'il  s'agisse  des  structures  ou  des  proprietes.  La  proposi¬ 
tion  de  3ean-Louis  Rivail  de  consacrer  notre  reunion  internationale  aux  progres  de  la 
modelisation  fut  done  adoptee  pratiquement  sans  discussion,  et  le  deroulement  de  cette 
rencontre  a  bien  montre  que  nous  avions  fait  le  bon  choix. 

Qu'on  me  permette  ce  qui  est  presque  un  retour  aux  origines.  Je  garde  le  vif 
souvenir,  etudiant,  d'avoir  suivi  avec  passion  un  coiloque  tenu  a  Paris  en  1948  sur  la 
liaison  chimique,  ou  Mulliken,  Pauling,  Raman.  Louis  de  Broglie,  Ketelaar,  Cabannes, 
Coulson,  et  les  brillants  jeunes  de  la  chimie  theorique,  echangerent  des  vues  qui  -  heu- 
reusement  -  ne  convergeaient  pas  toujours.  Ce  coiloque  est  un  peu  entre  dans  la  legen- 
de  car  beaucoup  d'experimentateurs,  a  cette  epoque,  ne  voyaient  dans  la  chimie  theo¬ 
rique  qu'un  jeu  intellectuel  d'interet  incertain:  "On  y  croira  quand  les  theoriciens  se- 
ront  capables  de  predire  quelque  chose,  et  pas  seulement  de  demontrer  ce  qui  est  deja 
connu.."  Les  theoriciens  ont  depuis  fait  leurs  preuves. 

I  Is  y  ont  ete  fabuleusement  aides  par  les  performances  chaque  jour  ameliorees 
des  calculatrices.  3eune  enseignant  j'ai  le  souvenir  d'avoir  vu,  chez  un  des  grands  cons- 
tructeurs  mondiaux,  une  calculatrice  dont  l'ensemble  occupait  environ  100  metres  car- 
res,  et  qui  calculait  "si  vite"  que  ses  innombrables  erreurs  etaient  sans  importance.  II 
suffisait  de  faire  chaque  calcul  une  centaine  de  fois,  et  de  dresser  la  courbe  de  Gauss 
correspondante:  le  sommet  etait  la  valeur  chercheel  Ce  genre  de  souvenir  fait  r^ver. 

A  la  meme  epoque  I'etude  des  structures  cristallines  etait  un  travail  de  tres 
longue  haleine,  aujourd'hui  il  faut  bien  choisir,  pour  son  interet,  la  structure  que  Ton 
etablira  rapidement.  En  regard  le  domaine  du  "calcul  chimique"  reste  partage:  certains 
travaux  sur  la  reactivite  moleculaire  relevent  encore  d'un  "broyage  numerique"  labori- 
eux.  Dans  d'autres  domaines,  et  la  modelisation  en  offre  des  exemples  multiples,  des 
logiciels  subtils  permettent  d'afficher  assez  simplement  les  structures  et  les  comporte- 
ments  de  molecules  complexes. 

Aussi  notre  Comite  scientifique  -  3.Y.Lallemand,  R.Lavery,  O.P.Mornon,  G.Pepe, 
3. L. Rivail,  B.Robinet,  E.Soulie  et  G.Vergoten  -  n'eut-il  guere  de  difficulte  a  reunir  un 
brillant  ensemble  de  conferences  et  de  contributions.  Elies  couvrirent  le  territoire  al¬ 
lant  des  methodes  generates  et  de  la  theorie  aux  techniques  de  calcul.  Le  coeur  de  la 
rencontre  fut  evidemment  etoffe  par  les  travaux  specifiques  de  modelisation:  catalyse 
et  adsorption,  methodes  classiques  de  modelisation  et  leurs  applications  aux  biosyste- 
mes  modeles  et  a  la  pharmacoiogie,  les  modeles  quantiques,  les  spectres, ,  les  modeles 
cinetiques.  La  modelisation  moleculaire  dans  ses  aspects  biophysiques:  methodes,  pro- 
teines  et  polypeptides,  ADN,  etudes  conjointes  par  modelisation  et  RMN,  occupa 
comme  on  le  prevoyait  une  place  de  choix. 

II  nous  avait  paru  egalement  necessaire  de  faire  le  point  sur  des  sujets  margi- 
naux  par  rapport  a  notre  theme  central,  mais  essentiels  dans  tous  les  deveioppements 
de  I'informatique:  on  trouvera  done  ici  une  belle  introduction  aux  problemes  de  1’opti- 
misation  numerique,  ainsi  que  la  presentation  des  retombees  "moleculaires"  des  travaux 
sur  I'intelligence  artif icielle. 

Enfin,  vu  i'importance  cruciale  des  deveioppements  techniques,  nous  avons  tenu 
une  table  ronde  sur  Revolution  de  ^architecture  des  ordinateurs,  a  laqueile  plusieurs 
des  plus  grands  constructeurs  mondiaux  ont  pris  part.  La  presentation  d'ensemble  du 
sujet,  comme  les  remarques  des  concepteurs  de  "hardware",  forment  un  chapitre  bref 
mais  certainement  enrichissant. 


XV 


Cette  reunion,  comme  toujours,  est  l'oeuvre  de  ceux  qui  y  ont  apporte  une 
remarquable  variete  de  resultats  nouveaux  et  de  vues  d'ensemble  de  grande  qualite. 
A  ce  sujet  nous  devons  des  remerciements  particuliers  a  plusieurs  collegues  que  la  ma- 
ladie  ou  des  circonstances  echappant  a  leur  controle  ont  empeche  d'etre  presents  mais 
qui  ont  tenu  a  nous  fournir  leurs  contributions:  W.F.  an  Gunsteren,  I.Vaisman,  T.Solma- 
yer  et  O.Iordache.  Notre  colloque  a  egalement  une  agreable  dette  a  Regard  des  parti¬ 
cipants.  Nous  n'avons  pas  eusouvent  des  discussions  aussi  animees  et  profitables.  Que 
tous  ceux  qui  ont  contribue  a  ia  reussite  scientifique  de  ce  colloque  soient  ici  remer¬ 
cies. 

Ce  colloque  etait  aussi  une  celebration,  celie  du  role  joue  par  Alberte  et  Ber¬ 
nard  Pullman  dans  les  developpements  de  la  Chimie  quantique  et  particulierement  de 
ses  applications  a  la  Biophysique.  Ils  ont  ete  parmi  les  pionniers  de  ce  terrain  si  mer- 
veilleusement  fertile,  et  la  science  leur  doit  des  travaux  de  premier  ordre,  tout  autant 
que  la  formation  de  nombreux  disciples  qui  suivent  brillament  les  chemins  qu'ils  ont 
ouverts.  Nous  avons  ete  heureux  de  concevoir  cette  reunion  comme  un  hommage  a  leur 
oeuvre,  hommage  complete  par  le  Conseil  de  notre  Societe,  dont  les  Pullman  sont  au- 
jourd'hui  membres  d'honneur. 

Enfin  cette  We  reunion  internationale  de  Chimie  physique  a  beneficie  de  l'appui 
materiel  et  financier  de  nombreux  organismes  et  entreprises  auxquels  va  notre  recon¬ 
naissance:  en  premier  lieu  l'Universite  de  Nancy  I  et  la  Municipality  de  Nancy  pour 
leur  accueil,  et  les  autorites  locales  et  regionales  pour  leur  contribution  a  notre  equi- 
libre  budgetaire:  Region  Lorraine,  District  urbain  de  Nancy,  UER  des  Sciences  de  la 
matiere  de  l'Universite. 

Nous  avons  egalement  re?u  les  encouragements  et  l'appui  des:  C.N.R.S.,  secteur 
Chimie,  Commissariat  a  l'Energie  Atomique  (DLPC),  Direction  des  Recherches,  Etudes 
et  Techniques,  European  Office  of  Aerospace  Research  and  Development,  USAF,  F.P.S. 
Computing,  Hotels  Concorde,  I.B.M.  France,  Ministere  de  I'Education  Nationale  (Direc¬ 
tion  des  relations  internationales),  Ministere  de  la  Recherche  et  de  la  Technologie, 
U.S.Army  European  Research  Office,  U.S.  Office  of  Naval  Research,  London  branch. 
Que  tous  soient  assures  de  notre  gratitude:  ces  soutiens  nous  ont  permis,  entre  autres, 
d'apporter  une  aide  a  de  jeunes  chercheurs  et  a  des  collegues  de  pays  dont  les  budgets 
recherche  sont  modestes. 


Clement  Troyanowsky 
Secretaire  general 

S.F.C.  /  Division  de  Chimie  physique 
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BUCKINGHAM  A.D./  Cambridge  Univ./  Univ.  Chemical  Lab/  Lensfield  Rd/  CAMBRIDGE 
CB2  1EW/(G.B.) 

CAILLET  J. /Dynamique  des  Int.  Moleculaires/  Tour  22/couloir  22-23/ler  etage/ 
4,  place  Jussieu/  75005  PARIS 

CARLES  P./  St  Procida/  B.P. 1/  13367  MARSEILLE  Cedex  11 

CASTILLO  S.  Mme/  Lab.  recherche  sur  l'energie/  Univ.  Paul  Sabatier/  118, 
route  de  Narbonne/  31062  TOULOUSE  Cedex 
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CASTRO  B./  SANOFI  Chimie/  17,  rue  des  Fosses  St  Jacques/  75005  PARIS 

CENSE  J.M./  E.N.S.C.P./  11,  rue  P.  et  M.  Curie/  75231  PARIS  Cedex  05 

CHANTELOUBE  M./  BIOSYM  Technologies  Inc./  15,  av.  Victor  Hugo/  75116  PARIS 

CHERMETTE  H./  IPN/  Lab.  Chimie  nucleaire/  43,  Bd  du  11  Novembre  1918/  69622 
VILLEURBANNE  Cedex 

CHOJNACKI  H./  Inst,  of  Organic  and  Physical  Chemistry/  Wyb.  Wyspianskiego 
27,  1-4/  50-370  WROCLAW/  (Pologne) 

CHOMILIER  J./  Biophysique/  Museum  national  d'histoire  Naturelle/  43,  rue 
Cuvier/  75231  PARIS  Cedex  05 

C.K.R.S./I.N.I.S.T./  26,  rue  Boyer  75020  PARIS 

COLETTI  PREVIERO  M.A.  Mme/  INSERM/0  58/  rue  de  Navacelle/  34000  MONTPELLIER 

COLONNA  F./  D.I.M./  Tour  22/1er  etage/  4,  place  Jussieu/  75252  PARIS 

COULOMBEAU  C./  LEDSS  VI/  Univ.  J.  Fourier/  BP  53  F/  38041  GRENOBLE  Cedex 

CRASPAIL  N.  Mile/  CRMC2-CNRS/  Campus  Luminy  Case  913/  13288  MARSEILLE  Cedex 

CREUZET  S.  Mile/  CEA/INRA/  CEN  Saclay/  SBPH  Bat.  532/  91191  GIF  SUR  YVETTE 

DASSONVILLE  R./  CRMC2-CNRS/  70,  route  Leon  Lachamp/  Case  913/  Campus  de 
Luminy/  13288  MARSEILLE  Cedex  2 

DAUTANT  A./  Cristallographie/  Univ.  Bordeaux  1/  33405  TALENCE  Cedex 

DECORET  C./  Lab.  de  Chimie  Industriel.le/UCBL/  UA  805-CNRS/  Bat. 305/  43,  Bd 
du  11  Novembre  1918/  69622  VILLEURBANNE  Cedex 

DEHARENG  D.  Mme/  Univ.  de  Liege/  Lab.  de  Microbiologie/  Inst,  de  Chimie/Bat. 
B6/  SART  TILMAN/  4000  LIEGE(Belgique) 

DELANCE  E./  ITODYS/  1,  rue  Guy  de  La  Brosse/  75005  PARIS 

DEMANGE  P./  Chimie  physique  macromoleculaire/  ENSIC-INPL/  BP  451/  1,  rue 

Grandville/  54001  NANCY  Cedex 

DERREUWAUX  P./  Faculte  de  Pharmacie/  INSERM  U279/  59045  LILLE  Cedex 

DESMARQUETS  M./  BIOSTRUCTURES/  8,  rue  Gustave  Adolphe  Hirn/  67000  STRASBOURG 

DEVILLERS  J./I.M.R.C. P./  Univ.  P.  Sabatier/  118,  route  de  Narbonne/  31400 
TOULOUSE  Cedex  et  Lab.  chimie  de  coordination,  205,  route  de  Narbonne/  31077 
TOULOUSE  Cedex 

DIVE  G./  Lab.  microbiologie/  Univ.  de  Liege/  Institut  de  Chimie/Bat  B6/  SART 
TILMAN/  4000  LIEGE( Belgique) 

DURUP  J./  Physique  quantique/  Univ.  Paul  Sabatier/  118,  route  de  Narbonne/ 
31062  TOULOUSE  Cedex 

DYMEK  Chester  J.  Jr  Lt-Col./E.O.A.R.D./223  Old  Marylebone  Rd/LONDON  NW1  5TH 
(G.B. ) 
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ETCHEBEST  C.  Mile/  IBPC/  13,  rue  P.  et  M.  Curie/  75005  PARIS 

EGMOND  M.R./  Unilever  Research  Lab./  PO  BOX  114/  3130  AC  VLAARDINGEN/ ( Pays 
Bas) 

FAIEZ  R./  Chimie  theorique/  Univ.  Nancy  1/  BP  239/  54506  VANDOEUVRE  LES  NANCY 

FATHALLAH  M./ESIPSOI  UA  126/  Fac.  des  Sciences  St  Jerome/  Av.  Escadrille 
Normandie  Niemen/  13397  MARSEILLE  Cedex  13 

FAUPEL  P./  Univ.  Gh.  Duisburg/  FB6,  Theoretische  Chemie/  Lotharstrasse  1/ 
4100  DUISBURG  1/  RFA 

FAVROT  J./  Structure  et  reactivite  de  molecules  phosphorees/  Univ.  P.  Saba¬ 
tier/  118,  route  de  Narbonne/  31062  TOULOUSE  Cedex 

FOURCOT  F./  ESPCI/  Lab.  Chimie  analytique/  10,  rue  Vauquelin  75231  PARIS 
Cedex  05 

FRITSCH  V.  Mile/  I.B.M.C./  15,  rue  R.  Descartes/  67084  STRASBOURG 

GALLEYRAND  G./  IBM  France/  67,  quai  de  la  Rapee/  75012  PARIS 

GALLO  R./  ESIPSOI/  Fac.  Sciences  St  Jerome/  Av  flormandie-Niemen/  13397 
MARSEILLE  Cedex  13  F 

GAYDOU  E./  Phytochimie/  Fac.  des  Sciences  St  Jerome/  Av.  Escadrille  Normandie 
Niemen/  13397  MARSEILLE  Cedex  13 

GELIN  B./  Polygen  (Europe)  Ltd/  25,  rue  du  Pont  des  Halles/  94566  RUNGIS  Cedex 

GENEST  D./  CBM/CNRS/  1A,  av.  de  la  Recherche  Scientifique/  45071  ORLEANS 
Cedex  2 

GENEST  M.  Mme/  CBM/  C.N.R.S./1A  av.  de  .la  Recherche  Scientifique/  45071 
ORLEANS  Cedex  2 

GHOMI  M./  Lab.  de  Spectroscopie  Biomoleculaire/  UFR  Biomedicale  de  Bobigny/ 
Univ.  Paris  XIII/  74,  rue  Marcel  Cachin/  93012  BOBIGNY  Cedex 

GILQUIN  B./  CISI-CEA/  Service  de  Biochimie/  Depart.  Biologie/  91191  GIF  SUR 
YVETTE 

GINGQLD  M./  Biophysique/  Depart,  de  Biologie/  CEN  SACLAY/  91191  GIF  SUR 
YVETTE  Cedex 

GIRE  P./  CONVEX/  Parc  d'Activites  du  Pas  du  Lac/  Immeuble  le  Daguerre/  9, 
av.  Ampere/  78180  MONTIGNY  LE  BRETONNEUX 

GRAND  A./  DRF/Lab.  de  Chimie/  CENG/8 5X/  38401  GRENOBLE  Cedex 

GRESH  N./  Biochimie  theorique/  I.B.P.C./  13,  rue  P.  et  M.  Curie/  75005  PARIS 

GUILLOT  B./  Physique  theorique  des  liquides/  Tour  16/  4,  place  Jussieu/  75252 
PARIS  Cedex  05 

HAAS  V.  Mile/  Univ.  GH  Duisburg/  FB6/  Theoretische  Chemie/Lotharstrasse  1/ 
4100  DUISBURG  1  (RFA) 

HARTMANN  B.  Mme/  IBPC/  13,  rue  P.  et  M.  Curie/  75005  PARIS 
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HATON  M.C.  Mme/  CRIN,Univ.  de  Nancy  I/B.P.  239/54506  VANDOEOVRE  LES  NANCY 

HENRIET  C./  CRAY  RESEARCH  FRANCE/  7,  rue  de  Tilsitt/  75017  PARIS 

HERSCOVICI  A./  IBM  France/  Tour  Descartes/  LA  DEFENSE  5/  2,  av.  Gambetta/ 
92400  COURBEVOIE 

HOGGAN  P./  Lab.  Chimie  theorique/  Univ.  Nancy  1/  B.P.  239/  54506  VANDOEOVRE 
LES  NANCY 

HUIGE  C.  J.M./  Gorlaeus  Lab./  State  Univ.  Leiden/  PO  BOX  9502/  2300  RA 
LEIDEN/( Pays-Bas ) 

HONGARO  J.B./  Chimie  physique/  Univ.  Paris  VII/  Tour  53-54  5eme  etage/  2, 
place  Jussieu  75005  PARIS 

IORDACHE  O./  Polytechnical  Inst,  of  Bucharest/  Depart,  of  Chemical  Eng./ 
1  Polizu/  R-78126  BUCHAREST  1 2/(Roumanie) 

JOFFRE  J./  Chimie  organique  physique  et  Cinetique  chimique  appliquees/  ENSC/ 
8,  rue  de  l'Ecole  Normale/  34075  MONTPELLIER  Cedex 

KARPLUS  M./  Depart,  of  Chemistry/  Harvard  Univ./  12,  Oxford  Street/ 

CAMBRIDGE, Massachusetts  02138  (U.S.A. ) 

KASSAB  E./  D.I.M./  Tour  22/ler  etage/  Univ.  P.  et  M.  Curie/  4,  place  Jussieu/ 
75252  PARIS  Cedex 

KOCHANSKI  E.  Mme/  ER  139/  Chimie  theorique/  Inst,  de  Chimie/  1,  rue  Blaise 
Pascal/  B.P.  296/R8/  67008  STRASBOURG 

KOZELKA  J./  Univ.  Rene  Descartes/  Chimie  et  Biochimie  Pharmacologiques  et 
toxicologiques/  45,  rue  des  Saints  Peres/  75270  PARIS  Cedex  06 

KWIATKOWSKI  J.S./  N.  Copernicus  Univ./  Inst,  of  Physics/  87-100  TORUN 
(Pologne) 

LACROIX-DORE  M.D.Mme/  IBM  FRANCE/  67,  quai  de  la  Rapee/  75012  PARIS 

LAGANT  P./  Fac.  Pharmacie  Lille/  INSERM  U  279/  3,  rue  du  Professeur  Laguesse/ 
59045  LILLE  Cedex 

LAHANA  M./  Aquitaine  Systemes/  Tour  Elf/  2,  place  de  la  Coupole/  92078  PARIS 
LA  DEFENSE 

LAMBERT  C.  Mme/  I.C.I.  PHARMA/  Centre  de  Recherches/  Chemin  de  Vrilly/  Z.I. 
la  Pompelle/  B.P.  401/  51064  REIMS  Cedex 

LANGLET  G/  CEA/IRDI/DLPC/SCR/  CEN  SACLAY/  91191  GIF  SUR  YVETTE 

LANGLET  J.  Mme/  Dynamique  des  Int.  Moleculaires/  Tour  22/  Couloir  22-23/ 
4,  place  Jussieu/  75005  PARIS 

LAPLANTE  S./  ICSN-CNRS/  91190  GIF  SUR  YVETTE 

LAUPRETRE  F.  Mme/  Physico-chimie  structurale  et  macromoleculaire/  ESPCI/ 
10,  rue  Vauquelin/  75231  PARIS  Cedex  05 

LAVERY  R./  I.B.P.C./  13,  rue  P.  et  M.  Curie/  75005  PARIS 
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LECLERC  J.P./  Centre  de  recherches  de  Vitry/  B.P.  14/  13,  quai  Jules  Guesde/ 
94403  VITRY  S/  Seine  Cedex 

LECLERCQ  J.M./  Lab.  dynamique  Int.  moleculaires/  UPR  A0271  CNRS/  Univ.P. 
et  M.  Curie/  Tour  22/  4,  place  Jussieu/  75252  PARIS  Cedex  05 

LEGOAS  M./  D.C.S.O./  Ecole  Polytechnique/  91128  PALAISEAU 

LEMARECHAL  C./  INRIA/  B.P. 105/  78153  LE  CHESNAY 

LLUCH  J./  Depart.  Quimica/  Univ.  Autonoma  de  Barcelona/  08193  BELLATERRA 
( Barcelona ) / ( Espagne ) 

LOOS  M./  Lab.  Chimie  theorique/  Univ.  Nancy  1/  B.P.  239/  54506  VANDOEUVRE 
LES  NANCY 

MAISSIAT  C./  Societe  Biomodele/  CEI/  15,  rue  Carnot/  BP  175/  86004  POITIERS 
Cedex 

MAJOUBE  M./  CEA-DLPC/  CEN  SACLAY/  91191  GIF  SUR  YVETTE 

MALLIAVIN  T.  Mile/  DESO/  Ecole  Poly technique/91 1 28  PALAISEAU  Cedex 

MARK  F./  Max  Planck  Inst,  fur  Strahlenchemie/  Stiftstr  34-36/  D  4330 
MULHEIM/(RFA) 

MAROUN  R./  Inst.  Biologique/  Pharmacochimie  moleculaire/  4  av.  de  l'Obser- 
vatoire/  75006  PARIS 

MASSEY  K./  FPS  COMPUTING/  21,  rue  des  Cevennes/  Silic  523/  94633  RUNGIS 

MATHIEX  J.P./  CRAY  RESEARCH  FRANCE/  7,  rue  de  Tilsitt  75017  PARIS 

MATHIS  II./  Chimie  theorique/  Univ.  Nancy  1/  B.P.  239/  54506  VANDOEUVRE  LES 
NANCY 

MATTALIA  J.M./  Chimie  inorganique/  Fac.  des  Sciences  St  Jerome/  13397  MAR¬ 
SEILLE  Cedex  13 

MAUREL  J.L./  Procida/  Groupe  Roussel-Uclaf/  ST  MARCEL/  13011  MARSEILLE 
MAZEAU  K./  CERMAV,CNRS/  B.P.  53x/  38041  GRENOBLE  Cedex 

MEI1ANI  S./  BP  RESEARCH  Center/  Chertsey  Road/  Sunbury  on  Thames/  MIDDLESEX, 
LDN,  TO 13  7  LN/UK 

MIKOU  A.  Mile/  ICSN/CNRS/Lab .  RMN/  rue  de  la  Terrasse/  91190  GIF  SUR  YVETTE 

MILLOT  C./  Lab.  Chimie  theorique/  Univ.  Nancy  1/  B.P.  239/  54506  VANDOEUVRE 
LES  NANCY 

MILON  A./  URA  31-CNRS/  Centre  de  Neurochimie/  5,  rue  Blaise  Pascal/  67084 
STRASBOURG 

MONQUE  R./  INTEVEP  SA/  Apdo  76343  CARACAS  1070  A/(Venezuela ) 

MONTEIL  P./  CONVEX/  Parc  d'Activites  du  Pas  du  Lac/  Immeuble  le  Daguerre/ 
9,  av.  Ampere/  78180  MONTIGNY  LE  BRETONNEUX 
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MOREAU  G./  ROUSSEL  UCLAF/  102,  route  de  Noisy/  93230  ROMAINVILLE 

MORGANTINI  P.Y./  Depart,  chimie  physique/  Univ.  de  Geneve/  30,  quai  E. 
Ansermet/  1211  GENEVE  4  (Suisse) 

MORXZE  I.  Mme/  Rhone  Poulenc  Sante/  Modelisation  moleculaire/  13,  rue  Jules 
Guesde/  94403  VITRY  sue  Seine  Cedex 

MOY  L./  Aquitaine  Systemes/  Tour  Elf/  2,  place  de  la  Ccupole/  92078  PARIS 
LA  DEFENSE 

MULLER  N.  Mme/  Inst.  Henri  Beaufour(IHB)/  17,  av.  Descartes/  92350  LE  PLESS1S 
ROBINSON 

NICKLAS  K./  Inst,  fur  Physikalisc  Chemie/  TH  Darmstadt/  Petersenstr. 

20/  6100  DARMSTADT/ (RFA) 

OHLENBUSCH  H./  Inst,  de  Physique  biologique/  4,  rue  Kirschleger 
67085  STRASBOURG  Cedex 

PANTEL  G./  Rhone  Poulenc  Sante/  Centre  de  recherches  de  Vitry/  BP  14/ 

13,  quai  Jules  Guesde/  94403  VITRY  sur  SEINE  Cedex 

PENTENERO  A./  UER  des  Sciences  de  la  matiere/  Univ.  de  Nancy  1/  B.P.  239/ 
54506  VANDOEUVRE  LES  NANCY 

perahia  D./Lab.  d ' Enzymologie  physico  chimique  et  moleculaire/  Univ.  Paris 
Sud/  Bat.  430/91405  ORSAY 

PEPE  G./  CRMC2-CNRS/  Campus  Luminy/  Case  913/  13288  MARSEILLE  Cedex 

PESQUER  M./  Physicochimie  theorique/  CNRS-UA  503/  Univ.  Bordeaux  1/  33405 
TALENCE  Cedex 

PETTITT  B.M./  Chemistry  department/  Univ.  of  Houston/  4800  CALHOUN/  HOUSTON 
TEXAS  77204-564 1/(USA) 

PIRIOU  J.M. /  Biochimie-Enzymologie/  Inst.  Gustave  Roussy/  39,  rue  Camille 
Desmoulins/  94805  VILLEJUIF  Cedex 

PLOCKYN  M./  Jobin  Yvon/16-18  rue  du  Canal/  91160  LONJUMEAU 

POLTEV  V./  Inst,  of  Biological  Physics/  USSR  Academy  of  Sciences/  Pushchino, 
Moscow  Region/  142292  USSR 

PONCIN  M./  IBPC/  13,  rue  P.  et  M.  Curie/  75005  PARIS 

POTHIER  J./  Biologie  Enzymologie/  inst.  Gustave  Roussy/  39,  rue  Camille 
Desmoulins/  94805  VILLEJUIF  Cedex 

PREMILAT  S./  Biophysique  moleculaire/  B.P.  239/  54506  VANDOEUVRE  LES  NANCY 

PULLMAN  A.  Mme/  Inst,  de  Biologie  Physico  Chimique/  13,  rue  P.  et  M.  Curie/ 
75005  PARIS 

PULLMAN  B./  Inst,  de  Biologie  Physico  chimique/  13,  rue  P.  et  M.  Curie/ 
75005  PARIS 

RAIIMOUNI  A./  Chimie  theorique/  Inst,  de  Chimie/  1,  rue  Blaise  Pascal/ 
6700  STRASBOURG 

REPELIN  Y.  Mme/  Ecole  Centrale/  Grande  voie  des  vignes/  92290  CHATENAY 
MALABRY 
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REYNAUD  ,J.C./  CONVEX/  Parc  d'Activites  du  Pas  du  Lac/  Immeuble  le  Daguerre/ 
9,  rue  Ampere/  78180  MONTIGNY  LE  BRETONNEUX 

RINALDI  D./  Chimie  theorique/  Univ.  Nancy  1/  B.P.  239/  54506  VANDOEUVRE 
LES  NANCY 

RIVAIL  J.L./  Lab.  Chimie  theorique/  Univ.  Nancy  1/  B.P.  239/  54506  VANDOEU¬ 
VRE  LES  NANCY 

ROBERT  B./  CENS/SBPH/CEN  SACLAY/  91191  GIF  SUR  YVETTE 

ROZOT  M./  Lab.  Chimie  theorique/  Univ.  Nancy  1/  B.P.  239/  54506  VANDOEUVRE 
LES  NANCY 

RUBIN  C.  Mile/  Inst,  de  Chimie/  1,  rue  Blaise  Pascal/  6700  STRASBOURG 

RUIZ  M./  Chimie  theorique/  Univ.  Nancy  1/  B.P.  239/  54506  VANDOEUVRE  LES 
NANCY 

RULLMANN  J.A.C./  Department  of  Organic  Chemistry/  Univ.  of  Utrecht/  Padu- 
alaan  8/  3584  CH  UTRECHT/  (Pays  Bas) 

SAARMETS  A./  Synthelabo  Recherche/  31,  av.  P.  Vaillant  Couturier/  92200 
BAGNEUX 

SCHLENKRICH  M./  Inst,  fur  Physikalische  Chemie/TH.  Darmstadt  BRD/  Peter- 
senstr.  20/  6100  DARMSTADT/  (RFA) 

SCHWAAB  F./CRIN/  B.P.  239/  Univ.  Nancy  1/  54506  VANDOEUVRE  LES  NANCY 

SHUSTOROVICH  E./  Corporate  Research  Lab./  Eastman  Kodak  Company/  Corl 
B-81/  ROCHESTER,  NY  14650-02001/(U.S.A. ) 

SIRI  D./  CRMC2-CNRS/Campus  Luminy/  Case  913/  13288  MARSEILLE  Cedex  9 

SMITH  J./  CISI-CEA/  Service  de  Biophysique/  Depart,  de  Biologie/  91191 
GIF  SUR  YVETTE 

SOLMAJER  T./  .Boris  Kidric  Chemistry  Inst./  Hajdrihova  19/  6100  LJUBLJANA 
SLOVENIA/  (Yougoslavie) 

SOULIE  E./  IRDI/DESICP/DIPC/SCM/CEN  SACLAY/91191  GIF  SUR  YVETTE 

SOUMPASIS  D.M./  Max  Planck  Inst,  fur  Bioph.  Chemistry/  P.O.  Box  2841/ 
3400  GOTTINGEN/ (R.F. A. )  et  Los  Alamos  Nat.  Lab./  LOS  ALAMOS  87545/  (USA) 

STAWARZ  B./  Centre  de  Biophysique  Moleculaire/  1A,  av.  de  la  Recherche 
Scientifique/  45071  ORLEANS  Cedex  2 

STONE  A. J./  Univ.  Chemical  Lab./  Lensfield  Road/  CAMBRIDGE  CB2,  1  EW  (G3) 

SUN  Jian-Sheng/  Biophysique,  Inserm  201/  CNRS  UA  481/  Museum  National 
d'Histoire  Naturelle/  43,  rue  Cuvier/  75031  PARIS 

SURCOUF  E.Mme/  Rhone  Poulenc  Sante/  Centre  de  recherches  de  Vitry/  BP 
14/  13,  quai  Jules  Guesde/  94403  VITRY  sur  Seine  Cedex 

TAYLOR  J./  BIOSYM  Technologies  Inc./  15,  av.  Victor  Hugo/  75116  PARIS 
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TORRENS  F./  Departament  de  Quimica  Fisica/  Univ.  de  Valencia/  Dr.  Moliner 
50/  46100  BURJASSOT  (Espagne) 

TRINQUIER  Gi/  Physique  quantique/  Univ.  Paul  Sabatier/  118,  route  de  Narbon- 
ne/  31062  TOULOUSE  Cedex 

TROYANOWSKY  C./  Division  de  Chimie  physique/  10,  rue  Vauquelin/  75005 
PARIS 

URBAIN  D./  FPS  COMPUTING/  21,  rue  des  Cevennes/  Silic  523/  94633  RUNGIS 

VAISMAN  I. I./  Ins.  of  non  Aqueous  Solutions/  Chemistry  of  the  USSR/  Academy 
of  Sciences/  Akademicheskaka  Street  1/  153045  IVANOVO/  (U.S.S.R. ) 

VAN  GUNSTEREN  W.F./  Depart,  of  Physical  Chemistry/  Univ.  of  Groningen/ 
Nyenborgh  16/  9747AG  GRONINGEN/  (Pays  Bas) 

VERGELATI  C./  Rhone  Poulenc  Recherches/  Centre  de  recherches  des  Carrieres/ 
85,  av.  des  Freres  Perret/  B.P.  62/  69192  SAINT  FONS  Cedex 

VERGOTEN  G./  Fac.  de  Pharmacie  Lille/  INSERM  U279/  3,  rue  du  Profeseur 
Laguesse/  59045  LILLE  Cedex 

VIGNE  F.  Mme/  I.R.C./  2,  av.  Albert  Einstein/  69626  VILLEURBANNE  Cedex 
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MOLECULAR  MODELLING,  with  or  without  QUANTUM  CHEMISTRY  ? 


Bernard  PULLMAN,  Institut  de  Biologie  Physico-Chimique,  13,  rue  Pierre  et 
Marie  Curie,  75005  Paris  (France) 


"A  good  model  is  worth  its  weight  in  gold". 

Francis  Crick  in  "What  Mad  Pursuit.  A-per- 
sonal  view  of  Scientific  Discovery"  Basic 
Books  Inc.  Publishers.  New  York  1988  p.  86. 


Coming  from  Francis  Crick  and  obviously  inspired  by  his  personal 
experience,  which  is  that  of  one  of  the  greatest  discovery  ever  made  in 
biology  revealing  one  of  the  essential  secrets  of  the  mechanism  of  life  on 
earth,  the  above  quotation  represents  for  all  those  engaged  or  willing  to 
engage  in  molecular  modelling  an  exceptional  encouragement.  It  reminds  us 
also  of  a  fact  which  strangely  or  amusingly  seems  to  be  ignored'  by  a  large 
number  of  young  biologists  and  even  forgotten  by  some  of  older  ones,  namely 
that  at  the  moment  of  their  discovery  of  the  double-helical  structure  of  DNA, 
Watson  and  Crick  have  never  carried  out  any  experimental  work,  neither  by  X- 
ray  crystallography  nor  by  any  other  means,  on  this  biopolymer.  Even  better, 
to  quote  Crick  (ref.  1,  p.  67)  :  "One  of  the  oddities  of  the  whole  episode  is 
that  neither  Jim  nor  I  were  officially  working  on  DNA  at  all.  I  was  trying  to 
write  a  thesis  on  the  X-ray  diffraction  of  polypeptides  and  proteins,  while 
Jim  has  ostensibly  come  to  Cambridge  to  help  John  Kendrew  crystallize 
myoglobin"  and  elsewhere  (ref.  1,  p.  65)  :  "Following  Pauling's  example,  we 
believed  the  way  to  solve  the  structure  was  to  build  models.  The  London 

workers  followed  a  more  painstaking  approach".  The  London  workers  to  which 
Crick  refers  are  Maurice  Wilkins  and  Rosalind  Franklin  and  the  painstaking 
approach,  the  constant  refinement  of  the  X-ray  diffraction  patterns. 

In  the  last  quotation  Crick  refers  to  Pauling's  example  and  what  he  has 

in  mind  is  Pauling's  then  very  fresh  discovery  (in  1951)  of  the  a-helical 

structure  of  polypeptides,  another  milestone  in  the  history  of  modern 
molecular  biology.  In  relation  to  the  title  of  my  lecture  this  may  be  the 
first  good  case  to  look  at  more  deeply. 

One  may  recall  that  in  the  midst  of  the  century  two  important 

Laboratories  were  striving  (competing?)  for  the  establishment  of  the 
structure  of  polypeptides  and  proteins  by  building  models  which  would  permit 
to  interpret  correctly  the  then  available  X-ray  diagrams  of  keratin  in 
particular.  These  two  groups  were  those  of  Pauling  and  collaborators  in 
California,  and  of  Bragg,  Perutz  and  Kendrew  in  Cambridge,  England.  As  is 
well  known  it  is  the  first  of  these  groups'  which  won  the  race.  What  was  the 
essential  advantage  which  gave  Pauling  the  victory  ?  A  very  simple  piece  of 
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knowledge,  which  he  possessed,  owing,  at  least  in  large  part  to  his  quantum 
chemical  approach,  namely  the  planarity  of  the  peptide  bond  which  results 
from  the  partial  double  bond  character  of  its  C-N  link,  resulting  itself 
from  the  "resonance"  (conjugation)  of  the  ^-electrons  of  the  C=0  double  bond 
with  the  lone  pair  electrons  of  the  N  atom.  This  knowledge  simplified  greatly 
the  model  building  and  played  an  important  role  in  leading  to  the  a-helix. 
This  is  perhaps  the  first  example  of  a  decisive  intervention  of  a  quantum 
chemical  concept  in  the  modelling  of  a  molecular  system  of  important,  in  this 
case  fundamental,  biological  significance.  The  English  group  lacked  this 
knowledge  and  was  consequently  too  permissive  about  the  flexibility  of  the 
system,  which  was  one  of  the  reasons  which  prevented  it  from  finding  the  good 
answer.  The  core  of  the  story  is,  however,  still  more  perplexing,  amusing  or 
shocking  depending  on  how  your  look  on  it,  especially  for  those  of  us  who  are 
quantum  chemists  and  especially  for  those  who  belong  to  the  somewhat  older 
generation.  I  am  quoting  again  from  Crick  (ref.  1,  p.  58).  "Perutz  learned 
that  after  one  of  his  seminars  a  local  physical  chemist  had  told  him  that  the 
peptide  group  ought  to  be  planar.  Perutz  has  even  recorded  it  on  his  notes, 
but  had  done  nothing  about  it.  It  was  not  that  they  had  not  tried  to  get  good 
advice,  but  some  of  what  they  had  received  had  been  unfortunate.  Charles 
Coulson,  a  theoretical  chemist  from  Oxford,  had  told  them,  in  my  hearing, 
that  the  nitrogen  atom  might  be  "pyramidal",  which  was  a  highly  misleading 
piece  of  information".  The  name  of  the  physical  chemist,  who  gave  the  good 
advise  is  not  given  but  Coulson  whose  blunder  is,  truly,  unbelievable  is 
committed  to  scientific  damnation.  I  wonder  whether  his  name  would  have  been 
quoted  if  he  were  from  Cambridge  instead  of  being  from  Oxford.  A  small 
consolation  to  his  darkened  memory  may  come  from  more  recent  findings  which 
indicate  that  the  planarity  of  the  peptide  bond  in  proteins  is  not  always 
absolute  and  that  even  strong  deviations  from  it  may  occur  at  some  places  in 
some  proteins.  As  early  as  1968  Ramachandran,  who  earlier  masterfully 
utilized  the  idea  of  planar  peptide  bonds  to  construct  his  well  known  tf,  ^ 
Ramachandran  plots  (for  general  information  see  e.g.  ref.  2)  indicated,  in  a 
paper  bearing  this  very  little  "the  need  for  nonplanar  peptide  units  in 
polypeptide  chains".  In  fact  later  developments  have  shown  that  very  small 
energy  expense  is  needed  for  twisting  the  peptide  bond  up  to  20-30°  (see  e.g. 
ref.  2).  But  these  are  refinements  and  Coulson's  blunder  remains  historical. 

Coming  back  to  the  DNA  double-helix,  was  quantum  chemistry  also  of  any 
(or  some)  significance  in  its  discovery  ?  The  answer  is  again  yes  and  it 
resides  again  in  a  piece  of  knowledge  based  or  supported  by  quantum  mechanics 
which  turned  out  to  be  decisive  for  the  construction  of  the  correct  model. 
This  time  this  piece  of  knowledge  relates  to  the  correct  representation  of 
the  dominant  tautomeric  forms  of  the  nucleic  acid  bases,  in  particular  in 
relation  to  their  possible  Lactam-Lactim  (keto-enol)  tautomerism. 

The  representation  prevalent  at  that  time  (circum  1950)  and  currently 
found  then  in  all  textbooks  described  the  oxygen  containing  bases  (6,  C  and  T 
or  U)  as  existing  in  the  lactim  (enol)  form.  As  it  is  obvious  to  all  of  us 
today  this  situation  precluded  the  correct  representation  of  the  hydrogen- 
bonded  base-pairing  scheme  (G-C,  A-T)  and  consequently  the  building  of  the 
correct  model  of  ONA.  It  was  against  this  diffi„.,lty  that  Jim  Watson  stumbled 
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for  months  till  the  fateful  day  in  which  an  american  visitor  at  the  Cambridge 
Laboratory  Jerry  Donohue  (this  time  the  name  is  known)  indicated  to  him  that, 
following  his  opinion,  the  bases  existed  in  the  Lactam  (keto),  form.  Now, 
quoting  from  Jim  Watson's  celebrated  book  (ref.  3,  p.  192)  :  "He  (Jerry 
Donohue)  admitted  that  only  one  crystal  structure  bore  on  the  problem 
(preference  for  the  keto  form).  This  was  diketopiperazine  whose  three 
dimensional  configuration  had  been  carefully  worked  out  in  Pauling's 
Laboratory  several  years  before.  Moreover  he  felt  sure  that  the  ouantum- 
mechanical  arguments  which  showed  why  diketopiperazine  has  the  keto  form 
should  also  hold  for  guanine  and  thymine.  I  was  thus  urged  not  to  waste  more 
time  with  my  hare-brained  scheme".  Whatever  it  be  and  we  shall  come  to  this 
point  back  immediately,  "after  that  the  (correct  double  helical)  model  was 
almost  inevitable"  (ref.  1,  p.  85)  and,  indeed,  the  next  day  the  double  helix 
was  born  (ref.  3,  p.  194). 

I  was  intrigued  by  the  "ouantum-mechanical  arguments"  refered  to  in  the 
above  quotation  and  went  to  see  the  original  paper  (ref.  4).  I  am  sorry  to 
admit  that  the  arguments  that  I  found  there  while  sufficiently  suggestive  do 
not  appear  as  completely  persuasive.  Diketopiperazine  as  seen  in  (I),  is  a 
cyclic  dipeptide  which  contains  two  "isolated"  peptide  bonds,  in  the  sense 
that  they  are  separated  by  saturated  carbons  and  are  thus  not  involved  in  a 
"general"  molecular  "resonance"  (electronic  delocalization),  typical  of  the 
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"conjugated"  electronic  systems  of  the  heteroaromatic  nucleic  acid  bases.  The 
"arguments"  advanced  by  Corey  refer  essentially  to  the  electron 

delocalization  within  each  peptide  bond  which  determines  their  planarity  and 
to  the  overall  observed  quasi  planarity  of  the  molecule.  They  could  not 
prejudge  thus  in  fact  of  what  may  happen  for  peptide  bonds  inserted  within  a 
larger  conjugated  system  such  as  that  characteristic  of  the  nucleic  acid 
bases.  The  situation  in  the  later  type  of  compounds,  as  it  could  be 
considered  objectively  at  that  time  may  be  inferred  from  the  following 

paragraph  in  our  book  "Les  Theories  Electroniques  de  la  Chimie  Organique" 

(Masson  Eds.  Paris,  p.  259)  published  in  1952,  thus  just  at  the  eve  of  the 

discovery  of  the  double  helix  :  "Les  exemples  classiques  de  la  tautomerie 
lactame-lactime  (dans  les  composes  conjugues)  sont  fournis  par  l'isatine, 
XXXVI  la  et  XXXVI  lb,  la  saccharine,  XXXVI I  la  et  XXXVI I  lb,  le  carbostyrile 
XXXIXa  et  XXXIXb,  etc.  La  forme  lactame  est,  du  point  de  vue  de  'la  somme  des 
energies  des  liaisons,  plus  stable  que  la  forme  lactine  d'&  peu  pr&s 
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10  kcal/mol.  En  revanche,  la  forme  lactime  a,  dans  les  composes  indiques,  une 
energie  de  resonance  legeremerit  plus  grande  que  la  forme  lactame,  et  cel  a 
grace  au  transfert  de  la  double  liaison  a  l'interieur  du  cycle.  La  position 
exacte  de  l'equilibre  tautomere  dans  ces  corps  est  toutefois  inconnue". 


XXXVii  a  XXXVII  b 


XXXVIII  a  XXXVIII  b 


XXXIX  o  XXXIX  b 


It  seems  thus  that  Donohue's  "argumer'.s"  were  in  fact  partly  an  inspired 
guess.  His  generalization  turned  out  indeed  to  be  correct  and  played  thus  a 
role  in  shaping  the  history  of  biology.  [The  general  problem  of  the 
tautomerism  of  the  purine  and  pyrimidine  bases  turned  out  to  be  nevertheless 
of  importance  for  some  biological  problems  e.g.  mutations.  At  the  time  when 
experimental  studies  were  difficult  Quantum  Theory  played  a  role  in  exploring 
various  features  of  this  phenomenon.  (For  general  reviews  see  refs.  5,6)]. 

And  what  about  Pauling  ?  Well,  remarkable  chemist  as  he  was,  he  was 
following  then  a  completely  erroneus  track,  proposing  a  model  for  DNA 
composed  of  three  helices  with  the  sugar  phosphate  backbone  in  the  center  and 
the  bases  at  the  periphery.  The  most  astonishing  feature  of  the  model, 
surprising  for  the  genial  quantum  chemist  that  he  was,  and  still  is,  was  that 
the  phosphate  groups  of  his  model  were  not  ionized.  The  hydrogens  of  the 
phosphates  were  on  the  contrary  used  to  form  an  H-bonding  scheme  that  held 
the  three  intertwined  chains  together.  Take  these  H-bonds  away  and  the 
structure  collapses.  As  it  did.  As  Watson  says  (ref.  3,  p.  160)  "Pauling's 
nucleic  acid  was  not  an  acid  at  all".  An  astonishing  blunder. 

Enough  about  the  past.  And  what  about  the*  present  ?  Certainly  a  useful 
question  at  a  time  when  molecular  modelling  becomes  an  important  instrument 
of  research  in  chemistry,  biology,  pharmacology,  etc.  Is  quantum  chemistry 
still  of  use  in  this  respect  or  can  it  be  superseded  by  more  classical 
approaches  ?  You  will  certainly  not  to  astonished  if  I  defend  the 
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significance  of  quantum  chemistry  in  this  respect  and  I  will  illustrate  my 
point  on  a  very  topical  example. 

It  concerns  0NA  again  and  relates  to  the  search  of  factors  responsible 
for  sequence  .  selectivity  in  the  interaction  of  groove  binding  antitumor 
ligands  with  this  bioDolvmer. 

The  most  representative  and  best  known  ligands  of  this  series, 
exemplified  by  netropsin  and  distamycin  A  (Fig.  1)  show  a  marked  triple 
specificity  for  binding  to  :  1)  the  minor  groove,  2)  of  AT  sequences  3)  of  B- 
DNA  for  a  recent  review  (refs.  7,8). 

The  current  and  historically  first  proposal  attributes  this  specificity 
essentially  to  hydrogen  bond  formation  between  the  peptidic  NH  groups  of  the 
drugs  and  the  02  atoms  of  thymine  and/or  N3  atoms  of  adenine  situated  in  the 
minor  groove  of  DNA.  Within  this  proposal  the  charged  end  groups  of  the  drugs 
are  considered  to  be  also  involved  in  the  interaction  probably  with  the 
phosphate  groups  of  DNA.  The  reality  of  the  hydrogen  bonds  is  confirmed  by  an 
X-ray  study  of  the  crystal  structure  of  a  complex  between  netropsin  and  the 
double-helical  DNA  dodecamers  CGCGAATTBrCGCG  (ref.  11),  (CGCGATATCGCG)2  (ref. 
12)  and  by  NMR  studies  on  the  association  of  netropsin  and  distamycin  A  with 
AT  sequences  in  oligonucleotides  (refs.  13,14). 

That  the  situation  is,  however,  more  complicated  than  this  simple  picture 
suggests  and  that,  in  particular,  the  precise  role  of  the  hydrogen  bonds  in 
determining  specificity  has  to  be  reconsidered  becomes  evident  from  the 
examination  of  other  molecules  studied.  Thus,  in  particular  the  bisquaternary 
ammonium  heterocycle  SN  18071  (Fig.  1),  which  has  no  hydrogen  bonding 
possibilities,  binds  also  to  DNA  and  shows  a  similar  AT  minor  groove 
specificity  (refs.  15,16). 
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Fig.  1.  Typical  AT  minor  groove  binding  ligands. 
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An  indication  that  the  source  of  the  specificity  common  to  these  diverse 
drugs  (and  many  others)  may  reside  to  a  large  extent  in  the  properties  of  the 
■minor  groove  of  AT  sequences  of  B-DNA,  rather  than  in  special  features  of  the 
drugs,  was  suggested  first  (ref.  17)  by  the  observation  that  the  grooves  are 
the  sites  of  location  of  the  deepest  molecular  electrostatic  potential  in 
DNA,  that  for  AT  sequence  the  deepest  potential  occurs  in  their  minor  groove 
and  that  the  deepest  potentials  in  DNA  are  those  of  the  minor  groove  of  AT 
sequences  (refs.  18,19).  One  could  conceive  then  that  provided  that  this 
groove  could  offer  an  appropriate  steric  fit  to  the  drugs  involved,  the 
origin  of  specificity  could  reside  in  the  combination  of  this  fit  with  a 
corresponding  strong  electrostatic  interaction. 

The  electrostatic  molecular  potential  referred  to  above  is  a  typical 
quantum-mechanical  index  of  the  electronic  structure  of  molecules.  It  defines 
(refs.  20,21)  the  electrostatic  (Coulomb)  potential  created  in  the 
neighbouring  space  by  the  nuclear  charges  and  the  electronic  distribution  of 
the  system  under  investigation.  For  a  given  wave  function  with  the  corres¬ 
ponding  electron  distribution  function  p(i)  the  value  of  such  a  potential  at 
a  given  point  P  in  space  V(P),  is  given  by  : 


a 


where  Za  is  the  nuclear  charge  of  nucleus  a.  This  quantity  has  the  double 
advantage  of  being  directly  obtainable  from  the  wave  function  and  of  being  an 
expression  of  the  global  molecular  reality,  clearly  related  to  what  a 
reactant  "feels"  upon  approaching  the  substrate. 

The  hypothesis  of  the  decisive  significance  of  this  potential  for  the 
selective  interaction  of  the  minor  groove  binding  ligands  with  DNA  was 
rapidly  confirmed  by  explicit  computations  of  interaction  energies  between  a 
number  of  compounds  of  the  type  illustrated  in  Fig.  1  with  model 
poly(dA).poly(dT)  and  poly(dG).poly(dC)  duplexes  in  B-DNA  conformation. 
Whatever  the  approximations  used  in  the  computations  (free  space  interaction 
(ref.  22),  interactions  is  solution  (refs.  23,24))  the  results  invariably 
show,  for  all.  the  compounds  investigated  that,  out  of  the  four  possibilities 
of  binding  namely  to  the  AT  minor  groove,  AT  major  groove,  GC  minor  groove  or 
GC  major  groove,  the  greatest  values  of  the  interaction  energy  are  obtained 
for  interaction  with  the  minor  groove  of  the  AT  sequences.  The  details  of  the 
computations  indicate  that  this  preference  is  favoured  predominantly  by  the 
electrostatic  component  of  the  interaction  energy.  This  demonstrated  that 
whatever  the  significance  of  hydrogen  bonds  for  the  stability  of  the  complex, 
the  formation  of  these  bonds  is  not  necessary  neither  for  binding  nor  for  the 
preference  for  the  minor  groove  of  the  AT  sequences  of  B-DNA.  It  seems,  in 
conformity  with  the  original  hypothesis,  that  provided  that  a  steric  fit  can 


be  obtained  in  the  minor  groove  the  ligand  will  be  sufficiently  stabilized 
there  by  the  favourable  electrostatic  potential  generated  by  the  AT 
sequences.  When  possible,  hydrogen  bonds  between  the  proton  donating  sites  of 
the  ligand  and  the  proton  accepting  sites  of  the  macromolecule  are,  of 
course,  formed  and  contribute  to  the  energy  of  binding  as  indicated  by  the 
greater  values  of  that  energy  in  netropsin  and  distamycin  A  than  in  SN  18071. 
Interestingly  for  the  forthcoming  discussions  the  details  of  the  theoretical 
results  show,  moreover,  that  the  charged  ends  of  the  ligands  are  in  the 
groove  and  do  not  exhibit  direct  interaction  with  the  phosphates,  a  result 
confirmed  by  the  X-ray  analysis  of  reference  (11).  It  may  also  be  useful  to 
add,  for  the  same  sake,  that  in  the  most  refined  computations,  the  difference 
in  interaction  energies  of  netropsin  with  the  minor  grooves  of  the  AT  and  GC 
sequences,  amounts  to  approximately  17  kcal/mole  in  favour  of  the  former. 

Because  of  the  great  importance  of  hydrogen  bonds  for  the  specific  base 
pairings  within  the  nucleic  acids,  the  demonstration  that  the  situation  is 
more  complex  for  the  specificity  of  outside  binding  to  these  acids  of 
external  ligands  is  of  obvious  great  interest.  This  interest  was  still 
strenghtened  by  the  lack  of  success  of  attempts  which,  within  the  classical 
concept  of  the  decisive  role  of  H-bonds,  tried  to  modify  the  structure  of  the 
AT  selective  drugs,  in  particular  netropsin,  so  as  to  make  it  GC  specific. 

Two  proposals  in  this  sense  deserve  a  particular  mention. 

1)  The  first  one  due  to  Dickerson  and  collaborators  (ref.  25)  suggested 
that  modifying  netropsin  by  replacing  its  pyrrole(s)  by  imidazole(s) ,  the 
ring  nitrogen  of  which  may  form  a  hydrogen  bond  with  the  NH2  group  of 
guanine,  could  yield  analogues  capable  of  recognizing  preferentially  GC  base 
pairs.  Such  analogues  were  named  "lexitropsins".  Fig.  2  indicates  the 
chemical  formulae  and  the  denomination  for  three  representative  derivatives 
of  this  class. 


Ntrrspiin  X  =  CH  ,  Y  =  CH 

L»K  A  X  :  N  .  Y  =  CH 

L«x  8  X  :CH,  Y:N 

Ltx  AB  X  =  N  ,  Y  =  N 


Fig.  2.  Lexitropsins 

The  theoretical  exploration  of  this  problem  (ref.  26)  did  not  confirm 
this  qualitative  suggestion.  The  computations  show  that  although  the 
complexation  energy  of  the  three  lexitropsins  with  poly (dA) . poly (dT) 
decreases  progressively  with  the  substitution  of  one  and  then  two  pyrroles  by 
imidazoles  and  although  the  complexation  energy  of  these  ligands  with 
poly(dG) .poly(dC)  increases  progressively  in  the  same  circumstances,  the 
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preference  for  the  AT  sequence  is  conserved  for  all  the  lexitropsins, 
although  it  diminishes  with. the  number  of  imidazoles  incorporated  in  place  of 
pyrroles.  Furthermore,  details  of  the  computations  show  that  the  dominance  of 
the  AT  sequence  over  the  GC  sequence  in  all  these  interactions  is  due 
essentially  to  the  greater  value  of  the  electrostatic  component  of  the 
interaction  energy  with  the  former,  reemphasizing,  thus  the  significance  in 
this  "specificity"  of  the  stronger  concentration  of  the  molecular 
electrostatic  potential  in  its  minor  groove.  Obviously  the  new  hydrogen  bonds 
of  lexitropsins  are  incapable  of  compensating  completely  for  the  initial 
electrostatic  advantage  (17  kcal/mole)  of  the  minor  groove  of  AT  over  that  of 
GC  sequences. 

An  experimental  exploration  of  the  binding  and  specificities  of  such 
lexitropsins  (refs.  27,28)  fully  agreed  with  the  above  theoretical  analysis 
and  substantiates  thus  the  important  role  in  groove  binding  interactions  of 
the  electrostatic  molecular  potential. 

2)  The  second  proposal  due  to  Goodsell  and  Dickerson  (ref.  29)  concerns 
another  group  of  related  ligands  called  isolexins  which  are  pyrrole-amine  and 
pyrrole-ketone  analogs  of  netropsin,  in  which  the  netropsin  backbone  is 
shortened  by  eliminating  either  the  C=0  or  the  N-H  group  of  the  amide  units 
in  order  to  make  them  isohelical  with  DNA.  It  was  again  suggested  that  by 
suitably  placing  pyrrole  or  furan  rings  in  these  ligands,  systems  could  be 
obtained  capable  to  decipher  the  desirable  DNA  fragments,  by  appropriate 
hydrogen  bonds  between  the  H-bond  donor  and  acceptor  on  the  two  interacting 
entities,  following  the  scheme  illustrated  in  Fig.  3. 


Fig.  3.  Isolexins  designed  to  bind  to  the  GAG  sequence. 

It  must  be  underlined  that  in  fact  the  isolexins  not  only  produce  a 
shortening  of  the  ligand  to  dimensions  isohelical  with  DNA  but  imply  also  a 
substantially  modified  scheme  of  interactions  with  this  biopolymer  :  this 
scheme  involves  now  only  the  heteroaromatic  rings  (and  possibly  the  end 
groups)  but  no  more  the  groups  bridging  the  rings  (as  was  the  case  in 


netropsin)  which,  whether  C=0  or  N-H,  are  directed  towards  the  exterior  of 
the  complex. 

The  detailed  theoretical  -exploration  of  this  system  has  led  to  a  series 
of  new,  particularly  striking  results  (refs..  30,31)  which  show  that  the 
potential  specificity  of  isolexin  type  drugs  for  AT  or  GC  minor  groove 
binding  depends  to  a  large  extent  if  not  decisively,  on  a  much  greater  number 
of  factors  than  the  formal  possibilities  of  hydrogen  bond  formation  as  drawn 
in  -cig.  3,  some  of  which  are  most  clearly  understandable  by  the  quantum 
chemical  approach. 

Thus  : 

1)  The  dominant  role  of  the  high  electrostatic  molecular  potential  in  the 
minor  groove  of  AT  sequence  is  again  demonstrated  by  the  fact  that  when  both 
end  groups  R  of  the  isolexin  of  Fig.  3  are  cationic  (e.g.  R=-CH2-CH2- 
C(NH3)+),  this  "furan-pyrrole-furan"  isolexin  although  planned  to  bind  to  a 
GAG  sequence,  prefers  to  bind  in  fact  to  the  regular  AAA  triplet. 

2)  When  the  R  groups  are  neutral  (e.g.  R=CH3)  a  particularly  striking 
phenomenon  happens  which  provides  a  strikingly  useful  illustration  of  the 
significance  of  quantum-chemical  notions  in  the  correct  description  of  this 
type  of  interaction.  Thus  in  the  proposal  of  Goodsell  and  Dickerson  (ref.  29) 
the  only  important  factor  is  the  shortening  of  the  amide  linkage  and  no 
distinction  is  made  whether  this  occurs  by  leaving  the  C=0  or  N-H  group.  The 
nature  of  these  groups  (to  which  we  shall  refer  as  linkers),  oriented  towards 
the  outside  and  not  taking  thus  any  direct  part  in  the  binding  with  DNA,  was 
not  considered  as  having  a  possible  significance  for  the  variation  of 
specificity. 

That  this  could  be  but  a  priori  need  not  be  the  situation  is  evident  if 
one  considers  the  differences  in  the  electronic  properties  of  the  C=0  and  N-H 
groups,  in  particular  when  they  are  engaged  in  conjugation  (resonance)  with  k 
electronic  systems,  as  it  is  the  case  in  isolexins.  Quantum  chemistry  (see 
e.g.  ref.  32)  teaches  us  that  while  >C=0  substracts  7r  electrons  from  such 
systems,  >N-H  provides  them  with  an  excess  of  such  electrons.  It  also 
indicates,  that  considering  both  the  jr  and  a  electrons,  the  C=0  bond  is 
associated  with  a  dipole  moment  of  about  2.4  D,  with  the  overall  direction 
C+0',  while  the  N-H  bond  is  associated  with  a  dipole  moment  of  only  about  1.3 
D,  with  the  overall  direction  N“H+.  This  situation  must  lead  to  significantly 
different  dipole  moments,  both  in  magnitude  and  in  direction,  in  isolexins 
utilizing  the  C=0  or  the  N-H  linkers.  The  natural  question  then  arises  : 
could  these  different  electronic  properties  of  the  linking  C=0  or  N-H  groups 
of  isolexins  have  any  significant  effect  on  the  binding  efficiencies  and 
specificity  in  the  interaction  of  these  compounds  with  DNA  ? 

The  answer  to  this  question  is  provided  by  theoretical  computations  (ref. 
30)  which  clearly  show  the  significance  of  this  situation  :  thus  while  the 
neutral  "furan-pyrrole-furan"  isolexin  with  two  C=0  linkers  continues  to 
prefer  binding  to  the  pure  AT  sequence,  the  ligand  with  two  NH  linkers  shows 
a  preference  for  the  GAG  triplet. 

This  striking  reversal  of  the  classical  situation  (AT  specificity)  is  the 
more  impressive  as,  obviously,  it  is  related  to  the  nature  of  the  linking 
groups,  C=0  or  N-H,  which,  as  remarked  above,  are  oriented  externally  with 
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respect  to  the  oligonucleotide  and  do  not  take  part  directly  in  the  binding 
process. 

Inspection  of  the  results  shows  that  this  different  behaviour  of  the  two 
lexitropsins  springs  essentially  from  the  different  evolution  of  the 
relatively  small  but  decisive  electrostatic  component  of  their  DNA-ligand 
interaction  energy.  A  more  elaborate  interpretation  can  be  provided  in  terms 
of  the  interaction  of  the  adversely  oriented  C=0  and  N-H  dipoles  with  the 
field  of  the  oligonucleotide  acceptors  (ref.  30). 

3)  The  significance  of  the  linking  groups  in  this  process  has  still  been 
enhanced  by  the  demonstration  (ref.  31)  that  the  GC  specificity  of  isolexins 
may  be  significantly  increased  by  using  as  linkers  the  neutral  -C=C-  groups 
(yielding  substances  which  we  called  vinylexins)  to  the  point  that  the 
neutral  vinylexins  of  Fig.  4  composed  of  three  proton  acceptor  pentagonal 
heteroaromatic  rings  have  been  proposed  as  appropriate  for  binding  to  a 
homogeneous  GGG  sequence. 

4)  The  efficiency  and  specificity  of  binding  depends  also  on  the  nature 
of  these  heteroaromatic  rings,  the  imidazole  ring  (Fig.  4b)  being  from  that 
point  of  view  superior  to  furan  (Fig.  4a). 


Fig.  4.  Vinylexins  designed  to  bind  to  the  GGG  sequence. 

5)  Finally,  a  complementary  demonstration  of  the  role  of  the  end  groups 
was  given  (ref.  31)  by  showing  the  enhancing  effect  upon  both  the  specificity 
and  the  binding  energy  of  the  replacement  of  one  of  the  terminal  neutral 
groups  of  vinylexins  by  a  cationic  substituent. 

The  studies  brought  thus  into  evidence  the  insufficiency  of 
considerations  based  solely  on  the  notion  of  geometrical  fitting  and  hydrogen 
bonding  capabilities  for  a  correct  estimation  of  GC  versus  AT  specificity  of 
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groove  binding  ligands.  Because  of  the  existence  in  B-DNA  of  an  intrinsic 
bias  in  favour  of  the  minor  groove  of  AT  sequences  with  respect  to  GC 
sequences,  due  to  the  concentration  of  a  deeper  electrostatic  potential  in 
the  former,  correct  evaluations  of  binding  specificities  can  only  be  made  by 
taking  into  consideration  the  overall  electronic  properties  of  the 
interacting  species  and  explicitly  calculating  the  energetics  of  DNA  complex 
formation  including  all  the  relevant  contributions.  In  these  considerations 
and  calculations  the  quantum  mechanical  concepts  and  methods  play  an 
essential  role. 

To  terminate  it  may  be  interesting  to  add  very  recent  experimental 
supports  for  some  of  the  main  proposals  developed  here.  Thus  the  favouring 
effect  for  the  GC  specificity  of  the  reduction  of  the  positive  charge  of 
groove  binding  ligands,  from  two  to  one,  was  evidenced  recently  by  a  study  of 
appropriate  monocationic  lexitropsins  (ref.  33).  Similarly  the  favouring 
effect  in  this  respect  of  vinyl  linkers  was  illustrated  in  a  study  by  Baguley 
(ref.  34)  of  a  series  of  bisquaternary  ammonium  heterocycles  of  the  type  of 
Fig.  5,  in  which  this  author  has  shown  that  the  replacement  of  an  amide 
linkage  at  position  B  or  of  an  imine  linkage  at  position  A  by  a  vinyl 
decreases  strongly  AT  selectivity  of  this  type  of  drugs.  There  was,  of 
course,  no  chance  that  these  dicatonic  compounds  could  show  GC  specificity 
but  the  decrease  of  AT  specificity  due  to  the  vinyl  linkers  substantiates  our 
hope  that  in  an  appropriate  skeleton,  such  as  that  of  the  monocationic 
vinylexins,  they  could  produce  GC  specific  ligands.  Inversely  one  may  say 
that  Baguley's  results  show  the  particularly  strong  AT  favouring  effect  of 
the  amide  linkage,  which  agrees  well  with  our  demonstration  (ref.  30)  of  the 
significance  in  this  respect  of  the  interaction  of  the  dipole  moment  of  this 
linkage  with  the  electrostatic  field  of  DNA. 


Fig.  5.  Bisquaternary  ammonium  heterocycles  of  Baguley. 

Finally,  I  would  like  to  take  advantage  of  this  occasion  to  express  my 
opinion  on  a  surprising  statement  found  recently  in  "Guidelines  for 
Publications  in  Molecular  Modeling  related  to  Medicinal  Chemistry"  (ref.  35) 
Section  g  of  the  introduction  states  :  "As  previously  recommended  by  the 
IUPAC  with  respect  to  QSAR  studies,  authors  should  REFRAIN  FROM  PUBLISHING 
PREDICTED  ACTIVITIES  OF  SPECIFIC  UNKNOWN  STRUCTURES,  since  that  may 
compromise  the  patentability  of  such  'structures  if  they  are  active,  so  there 
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is  reduced,  incentive  to  synthesize  these  materiels".  To  the  annoyance  of  my 
students  arid  collaborators  upon  reading  this  prescription  I  have  first 
replied  that  on  the  contrary’  they  should  feel  happy  with  it  because  if  this 
situation  is  true  they  are  free  now  to  publish  any  crazy  prediction  they 
wish,  as  nobody  is  going  to  try  to  verify  it  anyway.  More  seriously,  it  seems 
astounding  that  anybody  who  believes  that  he  has,  after  painstaking  efforts, 
found' a  structure  which  predictably  may  be  of  importance  for  chemotherapeutic 
activity  (say  in  cancer)  should  be  advised  to  keep  it  in  his  drawers,  in 
order  not  to  prevent  a  hypothetical  experimentalist  to  stumble  upon  it, 
possibly  accidentally,  in  some  unknown  future.  The  cynicism  of  the 
"Guidelines"  is  in  this  respect  shocking  and  the  attitude  which  they 
represent  immoral.  Unless  we  consider  seriously  an  advise  by  Crick  (ref.  1 
p.  110)  who  writes  :  "The  best  a  theorist  can  hope  to  do  is  to  point  an 
experimentalist  in  the  right  direction,  and  this  is  often  best  done  by 
suggesting  what  directions  to  avoid".  Taking  as  an  example  our  theoretical 
search  for  GC  minor  groove  specific  isolexins  described  above,  this  would 
mean  that  we  should  have  published  only  our  negative  results  in  this  respect 
obtained  for  C=0  linkers  and  dicationic  end  groups  stressing  that  we  only 
publish  the  negative  ones.  Leaving  thus  to  a  prospective  experimentalist  the 
chance  to  deduce  from  such  a  presentation  the  possible  positive  results  which 
could  be  obtained  with  N-H  linkers  and  neutral  or  monocationic  end  groups. 
Such  a  procedure  could  preserve  our  right  to  a  courtously  hidden  intelectual 
priority  and  at  the  same  time  the  right  of  the  prospective  experimentalist  to 
the  possible  benefits  of  a  successful  patent.  Such  a  compromise  requires, 
however,  a  confidence  in  the  honesty  and  intellectual  abilities  of  the  latter, 
which  few  theorists  are  probably  ready  to  risk. 
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DISCUSSION 


MAROUN  -  For  hexitropsin  AB,  and  for  the  neutral  isolexins,  for  example,  your 
calculations  show  a  complexation  energy  difference  between  the  AT  and  the  GC 
sequence  of  ~3.5  Kcal/mole.  In  other  instances,  the  differences  amount  to  29 
Kcal/mole.  Are  the  quantities  on  the  former  cases  significant  enough  to  draw 
conclusions  concerning  the  sequence  specificities  of  this  family  of  compounds  ? 

B.  PULLMAN  - 1  believe  that  I  have  given  the  answer  to  this  question  in  my  very  talk. 
Because  we  thought  that  a  difference  of  specificity  of  3.5  Kcal/mole  in  our  prototype 
was  not  significant  enough  (although  significant  nevertheless)  we  worked  on 
improving  it  by  looking  for  structural  modifications  of  the  Ligands  which  would  just  do 
that  and  we  succeeded.  It  is  certain  that  small  indices  of  specificity  are  less  reliable 
than  great  ones.  By  great  ones  I  mean  equal  or  above  6-7  Kcal/mole. 


17 


Modelling  of  Molecular  Structures  and  Properties.  Proceedings  of  an  International  Meeting, 
Nancy,  Prance,  11-15  September  19893  J.-L.  Rivail  (Ed.) 

Studies  in  Physical  and  Theoretical  Chemistry,  Volume  71,  pages  17-26 
©  1990  Elsevier  Science  Publishers  B.V.,  Amsterdam  —  Printed  in  The  Netherlands 


GENERAL  THEORY  OF  INTERMOLECULAR  FORCES 


A.D.  BUCKINGHAM 

University  Chemical  Laboratory,  Lensfield  Road,  Cambridge,  CB2  1EW,  U.K. 


~1 


*  Q 

i 


I 

i 


SUMMARY  | 

There  is  much  current  interest  both  in  inter-  and  intra-molecular  potential  energy  surfaces.  The  • 

structure  and  propterties  of  van  der  Waals  molecules  and  clusters  provide  an  important  source  of  j 

information  about  molecular  interactions.  This  information  can  be  used  to  generate  i 

intermolecular  potentials  that  can  be  useful  in  descriptions  of  larger  systems,  such  as  condensed  - 

phases,  solutions  of  surfactants,  and  biomacromolecules.  It  is  convenient  to  divide  ; 

intermolecular  potentials  into  long-range  and  short-range  components.  The  former  are  related  via  f 

perturbation  theory  to  the  charge  distribution  and  polarizabilities  of  the  free  molecules,  and  the  i 

resulting  long-range  potentials  vary  as  an  inverse  power  of  the  separation  between  the  molecules.  j 

The  short-range  interactions  result  from  the  overlap  of  the  electron  clouds  of  the  interacting  S 

molecules  and  diminish  exponentially  with  the  separation.  By  dividing  the  molecules  into  atoms  j 

or  small  groups  of  atoms,  it  is  possible  to  obtain  convenient  and  convergent  representations  of  > 


the  potential.  The  approach  can  be  used  to  provide  a  theoretical  basis  for  the  popular  site-site  i 

potential  models  that  are  now  used  extensively  in  computer  simulations.  Attention  will  be  paid  to  | 

the  additivity  or  non-additivity  of  potentials,  to  the  role  of  the  solvent  in  solutions,  and  to  > 

changes  in  the  electronic  properties,  such  as  the  dipole  moment,  that  result  from  intermolecular  * 

forces.  I 


INTRODUCTION 

Intermolecular  forces  have  an  important  role  in  many  branches  of  science,  including  | 

physics,  chemistry,  molecular  biology,  materials  science,  and  crystallography.  Attractive  forces  1 

are  responsible  for  the  existence  of  liquids  and  solids  and  for  the  structure  of  [ 

biomacromolecules.  The  repulsive  forces  that  operate  through  the  overlap  of  electron  clouds  j 

when  atoms  are  close  together  determine  nearest-neighbour  distances  and  the  densities  of  liquids  j 

and  solids  as  well  as  their  compressibility.  Much  quantitative  information  about  intermolecular  ! 

potentials  is  available  and  in  most  cases  this  has  been  obtained  through  the  study  of  small  i 

moleucles,  and  in  particular  from  pairwise  interactions,  as  in  a  van  der  Waals  dimer.  This  < 

accurate  information  can  be  applied  to  the  study  of  larger  molecules,  particularly  through 

dividing  the  large  molecule  into  smaller  parts  involving  groups  of  atoms  that  occur  in  simpler  t 


systems. 

Even  for  small  molecules,  difficulties  exist  For  example,  the  interaction  of  two  water 
molecules  in  the  water  dimer  (H20)2  involves  a  potential  surface  depending  on  twelve 
coordinates  -  these  are  the  distance  R  between  the  centres  of  mass,  the  angles  0a  and  0b  between 
the  dipole  axes  fra  and  gb  of  each  H2O  and  the  intermolecular  vector  R,  the  azimuthal  angle  <{> 
between  the  planes  containing  Ua  and  R  and  Hb  and  R,  the  orientations  %a  and  Xb  of  each  H2O 
about  its  dipole  axis,  together  with  the  three  internal  vibrational  degrees  of  freedom  in  each  H2O 
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molecule,  involving  the  OH  bond  lengths  and  the  HOH  angle.  Thus  even  in  this  case 
simplifications  are  essential.  In  the  general  case  of  interacting  molecules  containing  Na  and  Nb 
nuclei,  there  are  3(Na  +  Nb)  -  6  independent  variables  (except  when  Na  =  Nb  =  1  when  there  is 
just  one  variable  which  is  the  intemuclear  distance  R). 

Of  these  3(Na  +  Nb)  -6  degrees  of  freedom  (3Na  -  6)  and  (3Nb  -  6)  are  associated  with 
the  internal  coordinates  of  the  molecules  a  and  b  and  the  remaining  6  describle  the  relative 
positions  and  orientations  of  the  two  molecules  -  the  so-called  intermolecular  degrees  of 
freedom. 

Fortunately,  there  is  an  important  simplification  and  this  is  that  the  change  in  the 
molecular  structure  and  properties  as  a  result  of  the  interaction  is  generally  small.  We  can 
identify  the  molecule  a  and  the  molecule  b  in  the  interacting  system,  and  we  can  use  perturbation 
theory  and  our  knowledge  of  free  a  and  free  b  to  describe  the  interaction  of  a  and  b.  This 
perturbation  aproach  is  valid  provided  the  effects  of  intermolecular  exchange  are  small,  and  that 
is  the  case  in  the  'long-range'  region.  Overlap  of  the  electron  clouds  causes  a  loss  of  'identity' 
of  the  molecules  -  it  is  associated  with  the  pair  ab  and  limits  our  ability  to  relate  the  interaction  to 
the  properties  of  free  a  and  b. 

The  Hamiltonian  of  an  interacting  pair  a,b  is 


H  =  Ha  +  Hb  +  Hint 


(1) 


where  Ha  and  Hb  are  the  Hamiltonians  for  free  a  and  free  b  and 


is  the  Coulombic  interaction  of  the  charges  of  molecules  of  a  and  b;  Rjj  is  the  distance  between 
the  ith  charge  e/3)  in  molecule  a  and  the  jth  charge  ejO>)  in  b. 

In  the  general  theory  of  long-range  intermolecular  forces  (refs.  1-2),  Hjm  is  treated  as  a 
perturbation  to  Ha  +  Hb,  so  the  unperturbed  basis  is  non-interacting  a  and  b.  The  unperturbed 
wavefunction  in  the  non-degenerate  case  is  a  product  Yma  Ymb  w^’cre  Yma  an(*  Ymb  are  ^ie 
vibronic  states  of  free  a  and  free  b.  Generally  and  \|tmb  will  be  the  ground  states  but 
sometimes  we  are  interested  in  excited  electronic  or  vibrational  states. 

The  perturbed  wavefunction  is 
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V* 

where  denotes  a  sum  over  the  complete  set  of  unperturbed  states  tj/pa  \ypb  with  the 
exception  of  the  initial  state  Vma  Vmt>-  Tbe  first-order  perturbed  wavefunction  therefore  consists 
of  the  addition  to  the  unperturbed  function  of  a  small  amount  of  all  those  unperturbed  states  of 
free  a  and  b  that  are  mixed  with  Vma  Vmb  by  tbe  perturbation  Hjnt.  The  extent  of  the  admixure  is 
given  by  eqn  (3). 

The  energy  of  the  pair  is 


The  first-order  energy  in  (5)  is  the  unperturbed  expectation  value  of  Hjn[  and  is  the 
electrostatic  energy  Uciectrostatic. 


The  second-order  energy  may  be  separated  into  two  distinct  contributions.  The  first, 
^induction,  consists  of  all  those  terms  in  which  either  pa  =  ma  with  pb*  mb,  or  pb  =  mb  with  pa  * 
ma.  The  other  second-order  contribution  is  Udispcrsion  and  it  is  comprised  of  the  remaining  terms 
in  which  pa  *  ma  and  pb  *  mb: 


^induction  -  u 


(a) 

induction 


+  u 


(b) 

induction 


where 


u 


(a) 

induction 


u 


(b) 

induction 


-2* 

-IU 

1  ^  1  Hint  |  ^'b)  1 

Epb  -  Kynh 

-j:  . 

fb^Wb 

(6) 


(7) 

(8) 


^dispersion  - 


(9) 


20 


^  Yma  an(*  Vmb  316  ground  electronic  states,  then  Udispcrsion  is  necessarily  negative, 

(a) 

leading  to  an  attractive  force  between  a  and  b;  u  induction ’s  a^so  negative  if  molecule  b  in  its 

/j_\ 

unperturbed  state  produces  a  non-vanishing  electric  field  at  a,  and  similarly  for  u  induction' 

If  i/n^nib *s  degenerate,  then  it  is  necessary  to  select  the  unperturbed  zero-order 
wavefunction  that  diagonalizes  H.  The  degeneracy  may  be  lifted  by  Hjnt,  in  which  case  there  is 
a  resonance  energy  ures0nance  which  may  be  either  attractive  or  repulsive.  The  first-order 
perturbed  energy  is  now  the  sum  of  uciectrostatic  +  u resonance-  For  example,  consider  an  atom  a  in 
an  excited  state  pa  interacting  with  an  identical  atom  b  in  its  ground  state  mb-  The  appropriate 
zeroth-order  states  are  (ref.  2) 

%  = 
and 

and  there  is  an  exchange  of  excitation  energy  due  to  H;nl  which  splits  the  energies  of  these  g  and 
u  states  by  £  ^  %b  /  X\nt f>b)  ■  If  m  is  an  S  state  and 

p  a  P  state,  then  the  longest-range  contribution  to  urCsonancc  will  vary  as  the  inverse  cube  of  the 
distance  between  the  atoms,  due  to  a  dipolar  exchange  of  a  photon.  If  p  were  an  excited  D  state, 
then  the  longest-range  resonance  interaction  would  vary  as  R'5  due  to  quadrupolar  exchange. 

An  interesting  example  of  u resonance  occurs  in  vibrationally  excited  van  der  Waals 
molecules  A2.  Consider  the  hydrogen  fluoride  dimer  (HF)2  in  which  one  of  the  HF  monomers 
is  vibrationally  excited.  The  interaction  of  the  monomers  means  that  the  vibrational  excitation 
can  be  resonantly  transfered  between  a  and  b,  leading  to  a  splitting  of  the  potential  surface. 

(HF)2  has  a  well-known  inversion  doubling  of  its  energy  levels  due  to  the  tunnelling  motion  that 
interchanges  the  proton  donor  and  acceptor  in  the  hydrogen-bonded  dimer  (ref.  3).  The 


(«) 

(11) 


tunnelling  frequency  is  about  20  GHz  and  it  is  reduced  to  about  one-third  this  value  in  the  excited 
vibrational  states  Vi  =  1  at  3930.9  cm'1  and  in  1)2  =  1  at  3868.3  cm-1  (ref.  4).  The  barrier  height 
opposing  the  tunnelling  motion  is  apparently  increased  from  300  cm*1  to  about  400  cm*1  in  t>i= 

1  and  391  cm*1  in  1)2  =  1  (ref.4).  This  change  in  the  effective  potential  energy  surface  with 
vibrational  excitation  can  be  attributed  to  uresonance  (ref.  5).  It  will  affect  (HF)2  but  not  (HFDF), 
since  there  can  be  no  resonance  transfer  in  the  latter  case,  so  this  mass-dependent  potential- 
energy  surface  goes  beyond  the  usual  clamped-nuclei  Bom-Oppenheimer  picture. 

In  the  ground  state  the  deuterium-bonded  species  HF...DF  is  favoured  over  the 
hydrogen-bonded  DF...HF  (ref.  3,6)  and  it  has  been  calculated  that  this  energy  difference, 
resulting  from  zero-point  vibrations  and  in  particular  the  bending  of  the  D  and  H  bonds,is  about 
163  cm*1  (ref.  7).  However,  in  the  excited  vibrational  levels  the  species  HFDF  will  not 
experience  a  resonance  energy  as  in  (HF)2  and  (DF)2. 

It  is  convenient  to  classify  long-range  intermolecular  interactions  as  in  table  1,  which 


TABLE  1. 

A  classification  of  molecular  interaction  energies 


Range 

Type 

Attractive(-) 
or  repulsive  (+) 

Additive  or 
nonaddirive 

electrostatic 

-/+ 

additive 

induction 

- 

nonadditive 

long 

dispersion 

- 

approximately  additive 

resonance 

-/+ 

nonadditive 

magnetic 

-/+ 

(weak) 

short 

overlap  (Coulomb 
and  exchange) 

-/+ 

nonadditive 

22 


indicates  whether  a  particular  interaction  is  additive  or  non-additive  depending  on  whether  the 
energy  of  a  cluster  of  three  molecules  can  or  cannot  be  represented  as  a  sum  of  the  three  pair- 
interactions  (ref.  2). 

The  non-additive  induction  energy  can  be  important  in  solvent  effects  on  intermolecular 
forces  between  ions  or  between  highly  polar  species.  The  induction  energy  is  determined  by  the 
square  of  the  electric  field  strength  at  the  polarizable  site  and  it  will  therefore  cause  an  attractive 
force  between  ions  of  the  same  sign  and  a  repulsion  between  ions  of  opposite  sign  (ref.  8). 

Limitations  of  the  long-range  description  of  intermolecular  forces 

The  great  advantage  of  the  long-range  description  of  molecular  interactions  is  that  it  is  a 
general  theory  that  expresses  the  interactions  in  terms  of  the  properties  of  the  isolated  molecules. 
The  long-range  energy  or  property,  such  as  the  induced  dipole  moment,  is  obtained  in  any 
particular  case  by  inserting  the  appropriate  monomer  properties  into  the  general  equations,  giving 
the  dependence  of  the  energy  or  property  on  the  intermolecular  degrees  of  freedom.  The 
monomer  charge  distributions  determine  the  electrostatic  energy,  and  their  static  and  dynamic 
polarizabilities  determine  the  induction  and  dispersion  energies,  respectively  (ref.2).  Some  of 
these  properties  are  accurately  known  from  experiment,  and  others  are  amenable  to  ab  initio 
computation. 

But  what  are  the  limitations  to  this  description?  At  what  separations  R  do  the  overlap 
effects  cause  significant  departures  from  the  long-range  formulation?  In  an  attempt  to  answer 
this  important  question,  we  carried  out  a  series  of  elaborate  ab  initio  computations  on  the  system 
Ne  HF  (ref.  10).  The  calculations  were  performed  within  the  self-consistent  field  (SCF) 
framework,  using  large  basis  sets;  the  results  are,  therefore,  close  to  the  Hartree-Fock  limit,  but 
they  exclude  any  effects  of  electron  correlation  (so  dispersion  forces  are  absent).  An  earlier 
calculation  on  Ne  HF  by  Losonczy  et  al.  (ref.  1 1)  had  led  to  the  conclusion  that  covalency  (a 
short-range  overlap  effect)  made  a  dominant  contribution  to  the  energy  at  the  potential  minimum 
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and  that  long-range  theory  was  applicable  only  when  the  separation  R  of  the  Ne  and  HF 
molecules  is  so  large  that  the  interaction  is  negligible. 

However,  the  important  role  attributed  to  covalency  in  Ne  HF  (ref.  1 1)  is  actually  an 
artefact  due  to  basis  set  superposition  error.  By  using  the  counterpoise  technique  of  Boys  and 
Bemardi  (ref.  12),  it  has  been  shown  (ref.  10)  that,  even  down  to  the  minimum-energy 
separation  of  Ne ...  HF,  the  long-range  theory  accounts  well  for  the  interaction  energy  U(R)  and 
for  the  dipole  moment  |i(R). 

The  problem  of  computing  the  electrostatic  and  induction  energies  of  van  der  Waals 
molecules  has  largely  been  overcome  by  using  the  method  of  distributed  multipoles  and 
distributed  polarizabilities,  introduced  by  Stone  (refs.  13-14).  This  approach  has  yielded  a 
reliable  practical  means  of  calculating  the  structures  (ref.  15)  and  dipole  moments  (ref.  16)  of  van 
der  Waals  molecules.  We  are  currently  investigating  whether  this  'long-range'  approach  will 
give  good  dipole  and  polarizability  derivatives,  and  hence  infrared  and  Raman  intensities. 

The  long-range  forces  are  balanced  by  the  repulsive  short-range  forces  at  equilibrium, 
and  these  are  not  amenable  to  evaluation  by  perturbation  theory.  Various  simple  models  have 
been  adopted  (refs.9,15)  but  there  remains  a  need  to  obtain  better  representations  of  these 
interactions. 
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DISCUSSION 


SOUMPAS1S  -  Prof.  Buckingham’s  example  of  the  interaction  of  two  ions  and  a  solvent 
molecule  applies  in  isolated  systems  (few  particles).  In  large  systems  the  dielectrically 
screened  Coulomb  interaction  (continuum  model)  is  capable  of  describing  many 
things  due  to  statistical  averaging  of  the  detailed  interactions. 

BUCKINGHAM  - 1  agree  that  my  example  of  a  polarizable  atom  at  the  midpoint  of  (a) 
an  anion  and  a  .cation  and  (b)  two  cations  represents  a  single  configuration  and 
therefore  excludes  the  influence  of  entropy.  It  should  not  be  taken  to  represent  ionic 
interactions  in  an  aqueous  solution.  Nevertheless,  it  illustrates  the  deficiency  of  the 
dielectric  continuum  model,  which  would  treat  (a)  and  (b)  similarly  and  which  always 
reduces  the  magnitude  of  the  interaction  between  the  ions.  In  case  (a)  the  interaction 
between  the  ions  is  actually  enhanced  by  the  presence  of  a  water  molecule  or 
polarizable  atom  between  the  charges. 


R I  VAIL  -  Couia  you  comment  a  bit  further  on  the  transferability  of  atomic  or  bond 
quantities  which  govern  intermolecular  energies  from  one  molecular  species  to 
another  one  and,  occasionally,  from  intermolecular  to  intramolecular  energies  (in  the 
case  of  macromolecules). 

BUCKINGHAM  -  I  believe  that  accurate  experiments  and  computations  on  small 
systems,  particular  Van  der  Waals  molecules,  provide  the  firm  basis  upon  which  we 
can  build  our  approximate  descriptions  of  the  interactions  in  macromolecules  and 
between  large  molecules.  The  parameters  describing  the  charge  distribution  and 
polarizabilities  of  atoms  and/or  groups  of  atoms  are  likely  to  be  similar  in  appropriate 
cases,  so  we  should  be  able  to  gain  at  least  a  qualitative  understanding  of  the 
potential  energy  surface  in  a  particular  case  from  a  knowledge  of  the  corresponding 
parameters  in  small  molecules.  For  example,  as  a  first  guess  for  the  distributed 
multipoles  in  a  carbonyl-containing  molecule,  one  could  use  the  values  computed  for 
formaldehyde.  You  have  raised  a  large  and  very  important  question  that  needs  careful 
and  extensive  investigation. 
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DURUP  -  In  the  case  of  two  cations  with  two  polarizable  molecules,  where 
second-order  effects  are  important  and  where  a  set  of  other  solvent  molecules  has  to 
be  considered,  how  large  should  be  the  whole  system  considered  for  the  theoretical 
treatment  to  converge  ? 


BUCKINGHAM  -  There  are  two  aspects  to  your  question  :  the  first  is  how  large  a 
sample  needs  to  be  to  reproduce  the  continuum  results  ?  For  two  actions  the 
convergence  in  the  case  of  a  spherical  sample  will  go  as  R'1  and  for  an  uncharged 
(cation  +  anion)  it  will  be  faster  and  go  as  R'3,  where  R  is  the  radius  of  the  sphere.  The 
second  part  asks  how  high  in  powers  of  the  polarizability  a  one  has  to  go  to  achieve 
convergence.  It  depends  on  a(4jte0R3)_1  where  R  is  the  distance  of  the  atom  from  the 
charges.  For  a  linear  system  with  charges  q  at  ±  R0  and  polarizable  atoms  at  ±  R  the 
induction  energy  can  be  obtained  exactly  as 


2q(R2+R§) 

2 

1  I  “ 

(4lteo)(R2-Ro)2_ 

4R3  (4ji  e0) 

Hence  the  power  series  in  the  polarizability  converges  fast  when  a/(4rce0)4R3  «  1, 
but  will  not  converge  when  a/(4rcs0)4R3  ->  1,  even  though  an  exact  analytical 
expression  exists.  In  this  case,  the  higher-order  interactions  diminish  the  induction 
energy  since  the  fields  of  the  induced  dipoles  oppose  the  field  of  the  charges  (q). 


DURUP  -  For  the  HF  dimer,  does  the  tunneling  frequency  reincrease  if  both  HF 
molecules  have  one  vibrational  quantum  ? 

BUCKINGHAM  - 1  do  not  think  that  the  experiment  has  been  done.  However,  as  you 
imply,  the  resonance  energy  will  not  be  present  if  both  HF  monomers  are  excited  into 
v=1 .  This  resonance  energy  is  large  when  one  HF  has  v=1  and  the  other  v=0.  It  will  be 
small  when  one  has  v=2  and  the  other  v=0,  and  is  zero  when  both  have  0,1,2,... 
quanta. 
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SGUMPASIS  -  The  effects  of  various  Hamiltonians  in  describing  many  body  properties 
of  classical  charged  fluids  (e.i.  electrolytes)  have  been  studied  in  some  detail.  The 
asymptotically  correct  (long  range)  effective  potential  of  two  ions  in  the  polar  solvent 
(e.i.  water)  is  a  dielectrically  screened  Coulomb  form  as  assumed  in  the  restricted 
primitive  model  (ions  :  charged  hard  spheres,  water  :  dielectric  continuum).  More 
"realistic"  solvent  modelling  (water :  hard  dipolar  spheres)  yields  a  different  effective 
interaction.  But  still  more  realistic  modelling  (water  :  hard  dipolar  +  quadrupolar 
spheres)  yields  effective  potentials  identical  to  the  dielectric  screened  Coulomb  over 
most  distances.  The  statistical  mechanics  of  dense  charged  systems  are  very  often 
dominated  by  the  long  range  part  of  the  effective  potential  and  therefore  refinements  at 
very  short  distances  (induction,  dispersion,  etc.)  often  do  not  give  rise  to  dramatic 
improvements  in  the  description  of  the  condensed  phase. 

BUCKINGHAM  -  The  point  of  my  simple  example  of  a  polarizable  particle  near  two 
charges  was  to  emphasize  the  deficiency  in  the  dielectric  screening  description  at 
short  range  -  it  may  give  the  wrong  sign.  At  very  long  range,  the  dielectric  screening 
correctly  describes  the  interaction  of  fixed,  or  slowly  fluctuating,  charges.  I  agree  that 
this  long-range  behaviour  is  dominant  in  determining  some  properties.  However,  the 
structures  of  flexible  molecules  containing  charges  and  dipoles  are  dependent  upon 
near-neighbour  interactions  where  the  dielectric  screening  approximation  fails. 
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ABSTRACT 

It  is  now  possible  to  calculate  the  interaction  energies  of  small  molecular  complexes  by  a  variety 
of  ab  initio  techniques,  though  there  are  several  sources  of  error,  notably  Basis  Set  Superposition  Error, 
which  is  especially  troublesome  when  electron  correlation  is  taken  into  account.  Such  calculations  have 
to  be  carried  out  at  a  wide  range  of  dimer  geometries  if  a  full  description  of  the  potential  energy  surface 
is  needed,  and  this  is  extremely  time-consuming.  For  complexes  involving  larger  molecules  it  is  out  of 
the  question. 

An  alternative  approach  is  to  isolate  the  components  of  the  perturbation  expansion  of  the  inter¬ 
action  energy,  namely  the  repulsion,  electrostatic,  induction,  and  dispersion  terms,  and  to  calculate 
each  of  them  independently  by  the  most  appropriate  technique.  Thus  the  electrostatic  interaction  can 
be  calculated  accurately  from  distributed  multipole  descriptions  of  the  individual  molecules,  while  the 
induction  and  dispersion  contributions  may  be  derived  from  molecular  polarizabilities.  This  approach 
has  the  advantage  that  the  properties  of  the  monomers  have  to  be  calculated  only  once,  after  which 
the  interactions  may  be  evaluated  easily  and  efficiently  at  as  many  dimer  geometries  as  required.  The 
repulsion  is  not  so  amenable,  but  it  can  be  fitted  by  suitable  analytic  functions  much  more  satisfacto¬ 
rily  than  the  complete  potential.  The  result  is  a  model  of  the  intermolecular  potential  that  is  capable 
of  describing  properties  to  a  high  level  of  accuracy. 


THE  SUPERMOLECULE  METHOD 

A  natural  way  to  calculate  the  interaction  energy  Uab  between  two  molecules  A  and  B  is  to  compare 
the  energy  of  the  complex  A---B  with  the  energies  of  the  separated  molecules: 

UAD  =  WAD-WA-WD.  (1) 

This  is  the  supcrmolcctile  method.  It  has  a  number  of  obvious  attractions:  it  is  very  easy  to  understand 
conceptually,  and  it  is  easy  to  apply,  using  standard  ab  initio  computational  techniques.  Unfortunately 
it  has  some  serious  limitations.  The  small  interaction  energy  UAb  is  calculated  as  a  difference  of  much 
larger  quantities,  and  this  is  a  serious  source  of  inaccuracy.  We  are  all  aware  of  the  need  to  avoid 
calculating  a  small  quantity  as  a  difference  of  large  ones  for  reasons  of  numerical  accuracy,  but  the 
problem  here  is  not  a  numerical  one;  it  is  that  all  of  the  energies  involved  are  subject  to  error,  because 
of  the  approximations  made  in  the  calculation,  and  small  relative  errors  in  any  of  the  energies  lead 
to  very  large  percentage  errors  in  the  interaction  energy.  Moreover  the  errors  in  do  not  cancel 
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with  the  errors  in  Wa  and  Wb  •  Consequently  the  calculations  must  be  carried  out  at  a  high  level  of 
accuracy  if  the  results  are  to  be  of  any  value.  However  this  means  that  a  large  basis  set  is  needed,  and 
since  the  computer  time  required  for  ab  initio  calculations  increases  roughly  with  the  fourth  power  of 
the  number  of  basis  functions,  these  calculations  are  very  expensive. 

Furthermore,  the  calculation  has  to  be  repeated  at  a  large  number  of  relative  configurations  of 
the  two  molecules  if  the  potential  surface  is  to  be  explored  adequately.  Six  coordinates  are  required 
to  specify  the  position  and  orientation  of  molecule  D  relative  to  molecule  A,  if  both  are  non-linear. 
If  the  calculation  is  repeated  for  only  four  values  of  each  coordinate  (a  very  inadequate  number)  a 
total  of  46  —  4096  points  u  ill  be  required.  There  are  many  examples  in  the  literature  of  calculations 
of  intermolecular  potential  energy  surfaces  which  use  a  wholly  inadequate  strategy  for  choosing  points 
on  the  surface.  It  is  common,  for  example,  even  in  calculations  on  pairs  of  diatomics,  where  there 
are  only  three  angular  coordinates,  to  perform  calculations  at  only  a  few  relative  orientations:  often 
the  linear,  rectangular,  T-shaped  and  crossed  configurations  only.  Even  if  calculations  are  done  for  a 
large  number  of  intermolecular  distances,  it  is  impossible  to  characterise  the  potential  energy  surface 
adequately  from  such  a  limited  number  of  orientations.  Fortunately  it  is  possible  to  explore  the  surface 
much  more  efRciently(l),  but  a  large  number  of  points  is  still  required. 

It  is  often  assumed  that  the  supermolecule  method  provides  the  ultimate  method  for  calculating 
intermolecular  interaction  energies,  and  that  it  provides  a  benchmark  for  all  other  methods.  This  is 
not  entirely  true.  Supermolecule  calculations  are  subject  to  a  number  of  deficiencies,  because  they 
inevitably  use  inadequate  basis  sets  and  do  not  take  full  account  of  electron  correlation. 

A  complete  treatment  of  electron  correlation,  even  within  a  moderate-sized  basis  set,  is  imprac¬ 
ticable  for  all  but  the  smallest  calculations.  It  is  therefore  necessary  to  use  a  limited  treatment.  The 
simplest  of  these,  the  so-called  ‘singles-and-doubles’  Cl,  or  CISD,  takes  into  account  the  contributions 
to  the  wavefunction  of  all  singly  and  doubly-excited  configurations  derived  from  the  SCF  ‘root’  config¬ 
uration.  This  is  a  variational  technique,  but  unfortunately  is  not  ‘size-extensive’:  the  energy  obtained 
for  a  complex  A  -  ■  •  B  at  infinite  separation  is  not  the  sum  of  the  energies  obtained  for  A  and  B 
individually[2].  The  requirement  of  size  extensivity  is  clearly  important  if  sensible  estimates  of  inter¬ 
action  energies  arc  required.  Although  the  CISD  method  does  not  satisfy  this  requirement,  there  arc 
methods  which  do;  they  include  Moller-Plesset  perturbation  theory  and  a  variety  of  ‘coupled  cluster’ 
methods[2,3].  CEPA  (the  Coupled  Electron  Pair  Approximation)  is  an  approximation  to  the  latter. 
However  none  of  these  is  variational,  and  all  are  approximate. 

The  basis  set  problem  is  common  to  all  ab  initio  calculations,  but  is  particularly  troublesome 
in  the  calculation  of  interaction  energies,  because  even  when  the  energies  of  A,  B  and  A---B  are 
calculated  variationally,  the  difference  (1)  is  not  an  upper  bound  to  the  interaction  energy,  and  it 
is  difficult  to  estimate  the  error  in  it.  A  well-known  deficiency  is  ‘Basis  Set  Superposition  Error’,  or 
BSSE,  which  arises  as  follows. 

We  choose  a  basis  for  the  calculation  of  molecule  A  that  seems  suitable  for  our  purpose.  Inevitably 
it  is  incomplete,  so  our  wavefunction  for  /I  is  not  exact.  The  same  is  true  of  the  calculation  for  the 
isolated  molecule  B.  Now  we  carry  out  a  calculation  on  the  complex  A  -  B,  and  obtain  a  new  energy 
that  includes  the  interaction  between  the  molecules.  But  in  this  calculation  there  arc  some  new  basis 
functions,  belonging  to  molecule  B,  that  were  not  present  when  we  calculated  the  energy  of  the  isolated 
A  molecule,  and  they  allow  the  wavefunction  of  molecule  A  to  be  improved  variationally,  so  that  its 
energy  falls.  This  is  quite  separate  from  the  true  physical  interaction;  it  is  a  spurious  elTcct  that  occurs 
because  the  basis  set  that  we  initially  chose  for  A  was  not  good  enough  to  describe  the  wavefunction 
exactly.  This  is  Basis  Set  Superposition  Error,  and  its  cfTect  is  to  make  the  interaction  seem  more 
attractive  than  it  really  is. 
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This  is  an  extremely  troublesome  problem.  The  standard  way  to  deal  with  it  is  the  counterpoise 
correction  proposed  by  Boys  &  Bernardi[4).  The  reference  energy  for  A  is  calculated  in  the  presence  of 
the  basis  functions  (but  not  the  electrons  or  nuclei)  of  molecule  B,  and  similarly  the  reference  energy 
for  B  is  calculated  in  the  presence  of  the  A  basis  functions.  In  this  way,  the  variational  improvement 
that  arises  from  the  presence  of  the  ‘foreign’  basis  functions  is  included  in  the  reference  calculation 
as  well  as  in  the  A---B  calculation.  (The  basis  functions  of  the  other  molecule  are  sometimes  called 
‘ghost’  orbitals  when  used  like  this.) 

It  has  been  pointed  out  that  in  the  A  •  •  •  B  complex  some  of  the  orbitals  of  B  are  not  available 
to  A,  because  they  are  occupied  by  the  electrons  of  B,  and  it  has  been  argued  that  in  the  counter¬ 
poise  calculation  these  orbitals  should  be  excluded.  There  has  been  a  great  deal  of  controversy  about 
this  point[5).  This  has  rather  obscured  the  fact  that  even  if  the  effects  of  BSSE  could  be  corrected 
completely,  the  calculation  would  still  be  in  error  because  of  the  inadequacy  of  the  basis  set  and  the 
correlation  treatment.  In  a  recent  survey  of  calculations  on  the  water  dimer[6],  it  was  concluded  that 
the  very  best  calculations  currently  available  were  in  error  by  some  120  cm-1.  This  is  a  substantial 
fraction  of  the  interaction  energy. 


INTERMOLECULAR  PERTURBATION  THEORY 

If  supermolecule  calculations  are  expensive  and  unreliable  for  calculations  on  small  van  der  Waals 
complexes,  it  is  clear  that  they  are  useless  for  calculating  interactions  involving  large  molecules  such 
as  proteins.  However  perturbation  theory  can  provide  a  more  satisfactory  approach.  It  is  often  assumed 
that  perturbation  theory  is  a  relatively  crude  approach  to  the  calculation  of  intermolecular  potentials, 
but  there  is  no  reason  10  believe  that  it  is  intrinsically  any  less  accurate  than  the  supermolecule 
approach.  The  attraction  of  perturbation  theory  is  that  it  leads  directly  to  an  expression  for  the 
interaction  energy  itself.  Moreover  it  is  possible  to  analyse  the  expression  into  separate  terms  that  can 
be  correlated  with  distinct  physical  elTects.  We  can  then  refine  each  of  these  terms,  expressing  each  of 
them  in  terms  of  properties  of  the  individual  molecules  that  can  be  calculated  much  more  accurately 
than  properties  of  the  complex.  Unfortunately  perturbation  theory  becomes  very  much  more  difficult 
at  short  range,  so  it  is  convenient  to  distinguish  between  this  and  the  long-range  case,  where  the 
overlap  between  the  wavcfunctioiio  on  different  molecules  can  be  neglected. 


Long-range  perturbation  theory — first  order 

For  two  molecules,  sufficiently  far  apart,  the  Hamiltonian  for  the  system  can  be  written  as 


n  =  nA  +  -hc  +  v, 


(2) 


where  Ha  is  the  Hamiltonian  for  the  isolated  molecule  A,  Hu  the  Hamiltonian  for  molecule  B, 
V  is  the  interaction.  V  arises  simply  from  the  electrostatic  interaction  between  the  particles  of  /I 
those  of  B\ 


V  = 


E 


d-cor,/ 


and 

and 

(3) 


where  e,-  and  ej  arc  the  charges  on  particles  i  and  j,  and  ri}  is  the  distance  between  them. 


If  the  eigenfunctions  of  HA  are  and  those  of  Ha  are  and  if  there  is  no  overlap  between 
the  two  sets  of  functions,  then  the  unperturbed  wavefunctions  for  /I  •  ■  •  B  are  |mn)  =  Ordinary 

Raylcigh-Schrodinger  perturbation  theory  then  gives  the  first-order  energy  of  interaction  as 


iy(1>  =  (oo|v|oo) 


rj2 


('!) 

(5) 
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which  is  just  the  classical  electrostatic  energy  of  interaction  between  the  charge  distributions  of  A  and 
D.  It  is  easily  evaluated  if  the  ground  state  wavefunctions  are  known,  but  the  calculation  requires  a 
knowledge  of  all  the  intermolecular  electron  repulsion  integrals  and  is  fairly  time-consuming.  If  we  want 
to  map  out  the  potential  energy  surface,  then  it  is  necessary  to  repeat  the  calculation  at  a  large  number 
of  relative  configurations,  just  as  in  the  supermolecule  method,  and  this  would  require  an  inordinate 
amount  of  computer  time  if  it  were  to  be  done  accurately.  The  electrostatic  interaction  is  quite  sensitive 
to  the  effects  of  electron  correlation,  because  the  individual  molecular  charge  distributions  are  modified 
quite  significantly  when  electron  correlation  is  taken  into  account. 


It  is  often  useful  to  replace  the  operator  (3)  by  its  mullipole  expansion: 

V  =  ^QtTAtBuQ^, 


(6) 


where  Qf  is  an  operator  for  one  of  the  multipole  moments  of  molecule  A  (charge,  dipole,  quadrupole, 
etc.)  and  TA^  is  an  interaction  function.  The  index  t  is  an  abbreviation  for  the  angular  momentum 
labels  Ik,  and  the  moment  operator  Qff.  is  defined  by 

$&  =  !>»•!£»(«*.*),  (7) 

<e/i 

where  r„  8,  and  <f>,  are  the  spherical  polar  coordinates  of  particle  i  in  a  local  coordinate  system  for 
molecule  A,  and  Cik  is  a  modified  spherical  harmonic: 

Clk(0,<l>)={ii:/(2l+l))hlk(0,<i>). 

The  derivation  of  (G)  is  straightforward[7,8),  and  is  not  given  here. 

If  now  we  use  the  multipole  expansion  of  the  perturbation  in  the  calculation  of  the  first-order 
energy,  eq.  (4)  becomes  simply 

(s) 

tu 

in  which  Qf  is  the  expectation  value  of  one  of  the  multipole  moments  of  molecule  A,  referred  to  local 
axes,  and  the  interaction  function  takes  into  account  the  orientation  dependence  of  the  interaction 
as  well  as  the  distance  between  the  molecules.  The  interaction  is  now  described  in  terms  of  monomer 
properties  (the  multipole  moments  Qf  and  Q„)  which  need  to  be  calculated  only  once,  and  so  can 
be  calculated  at  a  much  higher  level  of  accuracy  than  we  could  contemplate  for  calculations  on  the 
complete  complex.  To  obtain  the  interaction  energy  at  any  arbitrary  configuration  it  is  now  necessary 
only  to  calculate  the  interaction  functions  for  that  orientation,  which  is  a  trivial  computation. 

We  note  in  passing  that  it  is  possible,  and  much  more  useful  for  practical  calculations,  to  use 
an  alternative  approach  in  which  the  indices  t  and  «  refer  to  real  multipole  moments,  rather  than  the 
complex  ones  defined  by  (7).  The  moments  are  now  denoted  Qikc  and  Qik,,  defined,  for  k  >  0,  by 

Qik c  ^[(-l^'Qifc  +  Qi.-Jt], 

i(hks  =  y^((-i)fcQifc  -  Qi.-fc]-  (9) 

No  transformation  is  needed  for  Q(0,  which  is  always  real.  The  notation  reflects  the  fact  that  Qtkc 
transforms  like  cos  k<j>  and  Qik,  like  sin  kij>.  The  factois  of  \/l/2  ensure  that  a  rotation  of  axes  induces 
an  orthogonal  transformation  of  the  moments.  The  first  few  of  these  moments  coincide  precisely  with 
the  Cartesian  charge  and  dipole  moment: 

Qoo  = 
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and  later  ones  describe  the  quadrupole,  octopole  and  so  on.  A  complete  list  is  given  in  ref.  7  for 
moments  up  to  hexadecapole.  The  functions  T*®  for  this  formulation  have  been  tabulated(7,9]  for  all 
multipole-multipole  interactions  up  to  terms  in  R~s. 

Although  the  expression  (8)  is  accurate  at  sufficiently  large  separation,  it  converges  only  if  the 
separation  between  the  molecules  is  large  compared  with  their  size.  This  is  not  the  case  for  any  pair  of 
molecules  at  the  separations  found  in  condensed  phases  or  in  weakly  bound  complexes.  Fortunately  this 
problem  is  easily  overcome,  provided  that  the  molecular  charge  distributions  do  not  overlap,  by  using 
a  Distributed  Multipole  Expansion,  which  assigns  charges,  dipole  moments,  etc.,  to  regions  comprising 
single  atoms  or  (in  larger  molecules)  small  groups  of  atoms.  For  small  molecules  it  is  helpful  to  treat 
the  bonding  regions  separately.  The  electrostatic  perturbation  then  takes  the  form 

^=EEW  i.  (ii) 

ab  tu 

where  Q?  is  the  operator  for  one  of  the  multipole  moments  of  region  a  of  molecule  A.  The  electrostatic 
energy  becomes 

Ui f  =  (is) 

ab  tu 

In  defining  these  regional  moments  it  is  necessary  to  specify  an  origin  for  each  region,  and  the  origin 
for  region  a  is  known  as  ‘site  a'.  In  the  Distributed  Multipole  approach  we  replace  the  extended 
charge  distribution  of  region  a  by  a  set  of  point  multipoles  at  site  a,  rather  than  replacing  the  charge 
distribution  of  the  entire  molecule  by  a  set  of  point  multipoles  at  the  molecular  origin. 

The  principle  of  the  Distributed  Multipole  Analysis  method[10]  is  that  the  charge  distribution 
of  any  molecule  can  be  represented  in  the  form 

YtPijW Ij, 

where  g,  and  %  are  Gaussian  primitive  orbitals  of  the  type  used  in  modern  wave-function  calculations 
and  ptJ  is  a  density  matrix  element.  Now  the  product  of  orbitals  centred  at  positions  a  and 
b  is  another  Gaussian  function  centred  at  a  point  p  on  the  line  from  a  to  b.  This  product  charge 
distribution  can  be  represented  exactly  in  the  long-range  limit  by  a  terminating  multipole  expansion 
about  p.  For  example,  in  the  case  where  ?/;  and  %  are  both  p  functions,  the  multipole  expansion 
about  p  terminates  at  the  quadrupole  term.  If  the  potential  due  to  the  charge  distribution  is  instead 
represented  by  a  multipole  expansion  about  some  other  point  at  distance  r  from  p,  the  terms  in  that 
multipole  expansion  behave  like  (r/R)n,  where  R  is  the  distance  R  at  which  the  potential  is  required, 
so  the  expansion  converges  rapidly  if  r  is  small  compared  with  R. 

Accordingly  we  choose  a  number  of  sites  in  the  molecule,  usually  but  not  necessarily  at  the 
nuclei,  and  represent  each  overlap  distribution  by  a  multipole  expansion  about  the  nearest  site.  This 
gives  a  description  of  the  charge  distribution  that  has  the  best  possible  convergence  properties  for  the 
given  choice  of  sites.  The  sites  can  be  chosen  to  suit  the  problem  being  investigated  and  the  accuracy 
required;  for  instance,  it  is  often  helpful  in  studies  of  small  molecules  to  add  sites  at  bond  centres, 
while  in  larger  systems  groups  of  atoms  like  a  methyl  group  can  be  represented  by  a  single  site.  In  this 
way  we  reduce  the  elaborate  and  time-consuming  evaluation  of  the  expression  (5)  to  the  very  much 
simpler  calculation  of  (8). 

The  simplest  possible  reliable  model  for  the  intermolecular  interaction  between  polar  molecules  is 
based  on  the  distributed  multipole  picture,  though  it  is  necessary  to  add  some  form  of  repulsive  poten¬ 
tial.  Buckingham  &  Fowler  added  a  simple  hard-sphere  repulsion,  using  standard  van  der  Waals  radii, 
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to  obtain  a  model  that  lias  been  extremely  successful  in  predicting  the  structures  of  hydrogen-bonded 
van  der  Waals  complexes[ll].  Several  authors  have  confirmed  by  using  more  detailed  calculations  that 
the  electrostatic  interaction  is  indeed  dominant  in  determining  the  structures  of  such  complexes[12-14). 
The  method  has  also  been  applied  to  systems  involving  aromatic  molecules[15],  where  it  is  very  suc¬ 
cessful,  contradicting  the  common  perception  that  because  such  molecules  are  non-polar  electrostatic 
effects  are  unimportant.  Of  course  it  has  to  be  recognised  that  other  effects,  especially  dispersion,  are 
also  important  in  such  systems. 

Computation  of  Distributed  Multipoles 


The  Distributed  Multipole  description  of  a  molecular  charge  distribution  is  not  unique;  it  depends 
on  the  choice  of  sites  and  on  the  precise  definition  of  the  region  boundaries.  (The  conventional  single-site 
multipole  description  is  not  unique  either,  since  it  depends  on  the  choice  of  origin[16].)  There  are  many 
ways  of  determining  these  distributed  multipole  moments[10, 17-23).  For  example,  Lavery,  Etchebest  & 
Pullman[19]  calculated  the  moments  of  localised  molecular  orbitals.  By  locating  the  expansion  centre 
at  the  centroid  of  the  orbital,  they  ingeniously  eliminated  the  dipole  contributions,  but  unfortunately 
the  convergence  properticc  '  f  this  description  are  poor  because  the  localised  orbitals  have  tails  that 
extend  across  much  of  the  molecule.  Moreover  this  description  cannot  easily  be  applied  to  correlated 
wavefunctions.  Cooper  k  Stutchbury(21j  used  Bader  partitioning  of  the  charge  distribution;  this  has 
some  conceptual  attractions  but  again  the  convergence  properties  of  the  resulting  description  arc 
poor(lO).  While  these  and  other  methods  have  some  merits,  the  over-riding  criterion  for  the  choice  of 
an  electrostatic  model  is  the  rate  of  convergence  of  the  resulting  multipole  expansion,  and  the  DMA 
procedure  was  designed  to  optimise  this,  as  was  the  technique  used  by  Vigne-Maeder  k  Claverie[23].  It 
is  implemented  as  a  part  of  the  Cambridge  Analytic  Derivatives  Package  (CADPAC)[24),  and  is  able 
to  evaluate  Distributed  Multipoles  up  to  rank  10  if  required,  at  sites  chosen  by  the  user.  The  rank  of 
multipolcs  on  any  particular  site  can  be  limited,  in  which  case  the  effects  of  higher  multipoles  on  that 
site  are  described  by  moments  on  neighbouring  sites.  For  example,  it  is  often  convenient  to  specify 
that  hydrogen  atoms  may  carry  a  charge  but  no  dipole  or  higher  moment.  The  same  procedure  is  also 
.available  in  the  Direct  SCF  program  of  Murray  k  Amos[25],  and  this  makes  it  possible  to  perform 
calculations  on  molecules  containing  over  100  atoms,  such  as  small  polypeptides  and  the  constituents 
of  liquid  crystals.  Faerman  k  Price[2G]  have  used  the  Direct  SCF  program  to  calculate  Distributed 
Multipoles  for  a  number  of  dipeptides  and  amides,  aim  have  been  able  to  obtain  a  transferable  multipole 
model  for  such  systems,  including  moments  up  to  octopole  on  each  heavy-atom  site.  This  transferable 
model  gives  a  good  account  of  the  potential  around  cyclosporin,  an  ll-.csiduc  polypeptide,  which  was 
also  studied  using  the  Direct  SCF  mcthod[27). 

There  are  two  popular  methods  for  modelling  molecular  charge  distributions  that  are  best 
avoided.  These  are  the  use  of  Mullikcn  populations  and  ‘potential-fitted  point  charges’.  Mullikcn  pop¬ 
ulations  have  been  widely  used,  but  the  only  argument  in  their  favour  is  that  they  are  readily  available 
because  they  are  calculated  by  almost  all  ab  initio  programs.  They  arc  widely  acknowledged  to  give  an 
poor  description  of  the  molecular  charge  distribution,  but  it  does  not  seem  to  be  generally  understood 
why  this  is  so.  The  reason  is  that  although  they  give  a  reasonable  account  of  the  distribution  of  charge 
between  atoms,  they  take  no  account  of  the  distortion  of  individual  atoms.  That  is,  they  ignore  atomic 
dipoles  and  quadruples.  Since  the  atomic  dipoles  make  a  substantial  contribution  to  the  molecular 
dipole,  this  means  that  Mulliken  populations  give  wildly  inaccurate  estimates  of  molecular  dipoles  and 
hence  of  the  long-range  electrostatic  potential.  Since  the  importance  of  atomic  dipoles  and  quadrupoles 
is  even  greater  at  short  range,  the  Mullikcn  charge  distribution  is  useless  for  modelling  short-range 
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interactions. 

A  better  method,  by  comparison,  is  the  ‘potential-fitted  point  charge’  approach[28,29].  Here 
the  molecular  electrostatic  potential  is  calculated  from  an  ab  initio  wavefunction,  and  a  point-charge 
model  is  derived  by  fitting  atomic  charges  so  as  to  reproduce  the  potential.  This  gives  a  better  descrip¬ 
tion  of  the  charge  distribution  than  Mulliken  charges,  but  as  we  have  seen  this  is  not  a  remarkable 
achievement.  The  method  has  little  else  to  commend  it.  Distributed  Multipole  Analysis[10]  has  shown 
that  atomic  dipoles  and  quadrupoles  make  very  important  contributions  to  the  overall  molecular  elec¬ 
trostatic  potential,  especially  at  short  range;  a  point  charge  model  can  only  mimic  these  effects  by 
assigning  charges  to  neighbouring  atoms,  and  the  resulting  description  is  bound  to  have  poor  con¬ 
vergence  properties  at  short  range.  Moreover  the  method  used  to  compute  the  point  charges  is  fairly 
time-consuming,  and  is  also  very  indirect.  Distributed  Multipole  Analysis,  on  the  other  hand,  gives 
a  very  fast  and  direct  route  from  any  wavefunction  to  a  description  of  the  charge  distribution  that 
represents  features  such  as  lone  pairs  and  tt  electrons  in  a  straightforward  and  natural  way,  and  is 
guaranteed  to  reproduce  the  electrostatic  potential  exactly  in  the  long-range  limit  as  well  as  providing 
a  very  good  approximation  to  it  at  all  accessible  configurations. 

A  further  disadvantage  of  the  potential-fitted  point-charge  approach  is  that  at  short  distances 
there  are  penetration  contributions  to  the  electrostatic  potential  (discussed  further  below)  and  these 
cannot  be  represented  by  any  expansion  in  powers  of  l/R  because  they  decay  exponentially  with 
distance.  Such  effects  should  be  included  in  the  short-range  part  of  the  potential;  attempts  to  include 
them  in  the  point-charge  model  merely  distort  the  model  without  improving  its  accuracy. 

Perturbation  Theory  at  Short  Range 

Perturbation  theory  becomes  troublesome  when  the  intermolecular  distance  is  short.  This  is  not  be¬ 
cause  the  perturbation  becomes  too  large;  for  configurations  that  are  accessible  at  thermal  energies  the 
interaction  is  usually  still  very  small  compared  with  the  separation  between  electronic  energy  levels, 
so  perturbation  theory  should  still  converge  rapidly.  The  problem  arises  from  the  overlap  of  the  wave- 
functions,  and  the  consequences  for  perturbation  theory  are  profound.  In  the  first  place,  it  becomes 
impossible  to  distinguish  between  electrons  that  ‘belong’  to  molecule  A  and  electrons  that  ‘belong’  to 
D.  It  is  therefore  no  longer  possible  to  separate  the  Hamiltonian  for  the  entire  system  in  the  manner  of 
cq.  (2),  and  there  is  no  satisfactory  way  to  define  a  perturbation  operator  representing  the  interaction 
between  the  molecules.  In  the  second  place,  the  wavefunctions  for  the  unperturbed  system  cannot  be 
taken  to  be  simple  products  of  the  form  !  they  have  to  be  antisymmetrized  with  respect  to  all 
electron  permutations.  A  more  serious  difficulty  is  that  whether  antisymmetrized  or  not,  the  functions 
V’mV’n  aro  not  orthogonal  to  each  other.  Ordinary  Raylcigh-Schrodinger  perturbation  theory  assumes 
that  the  unperturbed  Hamiltonian  has  a  complete  set  of  eigenfunctions,  and  that  they  arc  orthogonal. 

Many  versions  of  perturbation  theory  have  been  proposed  to  overcome  these  problems.  A  large 
number  of  them  rely  on  an  expansion  of  the  perturbation  equations  in  powers  of  the  overlap  between 
the  functions  on  A  and  those  on  D.  This  approach  appears  to  work  when  small  basis  sets  are  used,  but 
as  the  basis  is  improved,  the  overlap  between  the  functions  on  the  two  molecules  becomes  larger,  and 
the  expansion  ceases  to  converge.  This  failure  of  the  overlap  expansion  occurs  witii  quite  modest  basis 
sets.  Accordingly  it  is  necessary  to  use  a  method  that  deals  explicitly  with  the  natural  non-orthogonal 
basis  functions  for  the  problem. 

If  we  have  to  use  non-orthogonal  wavefunctions,  then  the  natural  one-electron  orbitals  in  which 
to  express  them  are  the  SCF  molecular  orbitals  of  the  non-interacting  molecules.  From  these  we  can 
construct  antisymmetrized  (detcrminantal)  wavefunctions  in  which  some  orbitals  of  each  molecule  are 
occupied.  Because  of  the  non-orthogonality  of  the  orbitals,  these  detcrminantal  wavefunctions  will 
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also  be  non-orthogonal.  It  is  possible  to  construct  a  perturbation  theory  in  which  the  wavefunction  is 
expanded  in  terms  of  these  determinants.  Fortunately  it  is  possible  to  formulate  it  in  such  a  way  that 
the  separation  of  the  Hamiltonian  into  an  unperturbed  part  and  a  perturbation  is  unnecessary.  The 
resulting  Intermolecular  Perturbation  Theory  (IMPT)[30]  has  been  incorporated  into  the  Cambridge 
Analytical  Derivatives  Package  (CADPAC)[24). 

The  first-order  perturbation  energy  in  the  short-range  theory  can  be  separated  into  two  parts: 
the  electrostatic  energy  and  the  exchange-repulsion  energy.  The  formal  expression  for  the  electrostatic 
interaction  is  still  eq.  (5)  at  short  range,  but  because  the  charge  densities  overlap,  it  can  no  longer 
be  expressed  completely  in  terms  of  a  multipole  or  distributed-multipole  expansion.  There  are  now 
additional  terms  that  describe  the  effects  arising  from  the  interpenetration  of  the  charge  distributions. 
This  penetration  effect  is  illustrated  by  the  simple  case  of  a  He+  ion  interacting  with  a  proton[23,31]. 
The  electrostatic  interaction  can  be  evaluated  explicitly  in  this  case,  and  is 

where  R  is  the  separation.  Here  we  can  distinguish  two  terms:  the  ‘multipolar  part’  l/R  which  is 
the  classical  repulsion  between  two  unit  charges  at  distance  R,  and  the  ‘penetration  term’  (Z  + 
l/R)exp(-2ZR)  (where  Z  =  2  is  the  helium  nuclear  charge)  which  describes  the  modification  to 
the  multipolar  expression  that  arises  from  the  penetration  of  the  proton  within  the  electronic  charge 
distribution  of  the  IIe+ .  We  see  that  the  latter  decays  exponentially  with  separation.  It  is  often  said 
that  the  multipole  expansion  of  the  complete  potential  is  an  asymptotic  expansion  in  l/R,  because 
it  is  impossible  to  find  a  convergent  series  in  l/R  for  an  exponential  e~aR.  However  it  is  much  more 
satisfactory  to  separate  the  multipolar  part  of  the  interaction,  which  converges  under  well-defined  and 
easily  attainable  conditions(23,31],  from  the  exponential  penetration  part,  for  which  any  attempt  at  an 
expansion  in  powers  of  l/R  is  pointless,  and  which  is  much  better  regarded  as  part  of  the  short-range 
interaction. 

Unfortunately  we  have  no  way  at  present  of  evaluating  the  penetration  part  of  the  electrostatic 
interaction  other  than  by  taking  the  difference  between  the  exact  electrostatic  energy  (5),  which 
includes  the  penetration  effects,  and  the  multipolar  approximation  (8),  which  does  not.  This  is  clearly 
no  help  at  all.  However  Hall  has  found[32]  that  a  good  account  of  the  molecular  electrostatic  potential, 
including  penetration  effects,  is  given  by  a  ‘current  bun’  model  comprising  point  charges  together  with 
a  sum  of  a  small  number  of  spherical  Gaussian  charge  distributions.  The  point  charges  in  such  a  model 
yield  a  form  of  distributed  multipole  expansion,  while  the  spherical  Gaussians  yield  both  multipolar  and 
penetration  contributions.  Here  we  are  dealing  with  the  electrostatic  interaction  between  a  molecule 
and  a  formal  test  charge,  but  a  similar  idea  may  be  helpful  in  describing  the  interaction  between  two 
molecules.  Whether  or  not  this  particular  approach  docs  prove  fruitful,  there  is  a  good  prospect  that 
we  can  find  an  efficient  way  to  describe  the  penetration  effects  as  well  as  the  long-range  effects  in  terms 
of  monomer  properties. 

The  second  additional  term  that  appears  in  the  first-order  energy  at  short  range  is  the  exchange- 
repulsion  energy,  which  is  much  more  difficult  to  deal  with.  It  comprises  two  effects,  an  attractive 
exchange  term  which  arises  because  the  electrons  can  exchange  between  the  molecules,  and  a  repulsive 
term  which  occurs  because  the  electrons  cannot  occupy  the  same  region  of  space  if  they  have  the 
same  spin  (Pauli  repulsion).  However  it  is  not  usually  helpful  to  separate  these  two  parts.  Because 
the  exchange-repulsion  is  a  first-order  perturbation  term,  it  is  still  only  necessary  to  know  the  unper¬ 
turbed  wavefunctions  to  evaluate  it.  Nevertheless  it  is  time-consuming  to  calculate,  because  all  the 
intermolecular  electron-repulsion  integrals  are  needed,  and  they  are  diffeie.ut  for  each  configuration  of 
the  complex.  Moreover  many  of  the  usual  simplifications  that  arise  in  the  evaluation  of  two-electron 
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integrals  do  not  apply,  because  the  wavcfunctions  of  the  two  molecules  are  not  orthogonal.  The  labour 
is  even  greater  if  correlated  wavefunctions  are  used  for  the  monomers,  though  Rijks  et  a/.[33]  have 
suggested  how  the  computational  effort  may  be  reduced. 

Unfortunately  there  is  as  yet  no  known  way  to  obtain  the  repulsion  energy  from  properties  of 
the  separate  molecules.  An  attempt  has  been  made  to  characterise  the  repulsive  surface  of  a  molecule 
by  performing  IMPT  calculations  between  the  molecule  and  a  suitable  test  particle,  such  as  a  helium 
atom.  Because  the  helium  atom  has  only  one  molecular  orbital  and  is  spherically  symmetrical,  such 
calculations  can  be  done  much  more  easily  than  calculations  involving  two  ordinary  molecules.  From 
the  data  for  the  repulsion  between  molecule  A  and  the  test  particle,  and  between  B  and  the  test 
particle,  it  may  be  possible  to  construct  a  repulsive  potential  between  A.  and  B.  Some  limited  progress 
has  been  made  with  this  idea[34].  An  alternative  approach[35]  has  been  based  on  the  suggestion[36-38] 
that  the  repulsion  energy  is  closely  correlated  with  the  overlap  between  the  molecular  wavefunctions, 
but  this  seems  likely  to  be  more  useful  as  a  guide  to  the  form  of  analytic  models  than  as  a  direct  route 
to  accurate  potential  functions. 

Consequently  it  is  necessary  at  present  to  rely  on  suitable  models  of  the  short-range  contributions 
to  the  energy.  A  successful  form  of  model  repulsion  potential  is  the  following  anisotropic  atom-atom 
repulsion: 

Urcp  =  £  exp  [<*(&„(,)(#„!,  -  p(nat))],  (13) 

ab 

in  which  the  shape  parameter  />„(,  and  the  hardness  parameter  aat,  depend  on  the  relative  orientation 
flai,  of  atoms  a  and  b.  (In  the  general  case,  f !„(,  is  a  short-hand  for  five  orientation  variables.)  There 
is  now  overwhelming  evidence[31]  of  the  need  to  introduce  this  kind  of  anisotropy  into  atom-atom 
repulsions.  More  elaborate  forms  for  the  radial  dependence  have  been  used  for  very  accurate  potentials, 
for  example  to  describe  the  Ar-  •  -Ar  interactional)],  but  there  is  not  enough  information  at  present 
to  characterise  the  repulsion  in  this  much  detail  for  larger  systems.  There  it  is  necessary  to  fit  a 
potential  model  such  as  (13)  to  whatever  data  may  be  available,  either  experimental  data  such  as 
crystal  structures  or  calculated  potentials  for  smaller  systems. 

One  of  the  attractive  features  of  the  repulsion  model  (13)  is  that  because  of  its  intrinsic  anisotropy 
it  provides  a  natural  way  to  describe  repulsions  between  small  groups  of  atoms,  so  that  it  is  not 
necessary  to  treat  every  atom  individually.  In  the  azabenzenes,  for  instance,  which  are  derived  from 
benzene  by  replacing  one  or  more  CII  units  by  N  atoms,  it  was  possible  to  obtain  a  model  of  the 
intermolecular  potential  in  which  each  CH  group  and  each  N  atom  was  treated  as  a  single  unit.  The 
CII  ■■•CII,  CII  •  •  •  N  and  N---N  repulsive  potentials  were  transferable  between  different  molecules, 
and  the  multipole  moments  were  also  transferable  except  that  the  short-range  inductive  effect  was 
described  by  the  transfer  of  approximately  jj-  of  an  electron  from  C  to  N  along  every  CN  bond. 
Empirical  dispersion  terms  were  also  included.  The  resulting  model  was  very  successful  in  describing 
the  very  varied  crystal  structures  of  a  number  of  azabenzcncs[40]. 

A  similar  approach  has  been  very  successful  in  studies  of  the  halogens,  Clj,  Br2  and  E.  These 
molecules  all  have  Cmca  crystal  structures,  unlike  most  diatomics,  and  this  cannot  be  understood 
until  it  is  appreciated  that  the  atoms  in  these  molecules  are  significantly  non-sphcrical,  being  flattened 
at  the  ends,  by  as  much  as  20%  in  the  case  of  I2.  The  crystal  structures  are  then  easily  understood  as 
a  consequence  primarily  of  packing  effects[41).  A  distributed-multipole  description  of  the  electrostatics 
and  an  anisotropic  atom-atom  dispersion  term  were  added  to  complete  the  potential  model,  which  was 
then  able  to  give  a  very  good  account  of  the  liquid  structure  and  properties  .and  the  lattice  frequencies, 
despite  being  fitted  only  to  the  crystal  structure  and  lattice  cnergy[‘12]. 
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Long-range  perturbation  theory — second-order  effects 

A  similar  philosophy  can  be  applied  to  the  higher-order  terms  in  the  perturbation  expansion.  The 
second-order  energy  is,  according  to  Rayleigh-Schrodinger  perturbation  theory, 
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This  is  conventionally  separated  into  three  terms:  those  in  which  n  0  but  m  =  0,  those  in  which 
m  5^  0  but  7i  —  0,  and  those  in  which  neither  to  nor  n  is  0.  These  three  terms  are 
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In  the  first  of  these,  we  may  perform  the  integrations  over  the  ground-state  wavefunction  of  B  to 
obtain 

uL  =  ^>(0|V |m)(m|1/ |0) 
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where  VD  —  (V><P|Vr|V’o5)  >s  the  potential  at  A  due  to  the  unperturbed  charge  distribution  of  B.  Eq.  (18) 
then  describes  the  response  of  molecule  A  to  this  potential,  and  is  the  induction  energy  of  molecule  A 
in  the  field  of  B.  Similarly  eq.  (16)  is  the  induction  energy  of  B  in  the  field  of  A. 


These  induction  energy  expressions  can  be  reformulated  so  as  to  depend  only  on  properties  of  the 
individual  molecules.  To  do  this,  we  start  from  the  expression  (18)  for  the  induction  energy  of  molecule 
A,  and  use  the  expression  (6)  for  the  perturbation  V.  We  evaluate  the  integral  over  the  coordinates 
of  molecule  B,  and  arrive  at  the  expression 


vL  =  E  E  <&Ti 


m  aa'bb'  tt'uu' 


a  (01Q?|to)(to|Q?,,|0) 

W*  -  W0A 


=  iEE 


aa'66'  tt'uu1 


(19) 


where 

Here  a°“,  is  a  polarizability  that  describes  the  response  of  the  moment  t  at  site  a  to  a  perturbation 
(a  change  in  potential,  field,  field  gradient,  etc.)  at  site  n'('13]. 

Now  the  expression  (20)  is  an  uncoupled  formulation  of  the  polarizability.  We  can  replace  it  by  a 
polarizability  derived  from  coupled  Hartrce-Fock  perturbation  theory,  which  is  more  accurate,  because 
it  takes  account  of  the  reorganisation  of  the  electron  distribution  in  a  self-consistent  manner.  Better 
still  would  be  to  evaluate  the  monomer  polarizability  by  a  method  that  takes  account  of  electron 
correlation  as  well['14).  But  whatever  the  level  of  calculation,  we  can  once  again  perform  a  much 
better  calculation  of  the  monomer  property  than  is  possible  for  the  dimer.  In  this  way  we  arrive 
at  a  description  of  the  induction  energy  that  is  far  more  accurate  than  we  can  obtain  through  either 
intermolecular  perturbation  theory,  where  the  perturbation  is  treated  in  an  uncoupled  fashion,  or  from 
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a  supermolecule  calculation,  where  the  size  of  the  basis  is  limited  by  the  need  to  perform  calculations 
at  a  large  number  of  points  on  the  potential  energy  surface. 


Distributed  Polarizabilities 

The  distributed  polarizabilities  that  occur  in  eq.  (19)  for  the  induction  energy  may  be  unfamiliar 
objects.  They  describe  the  change  in  the  charge  distribution  of  region  a  of  the  molecule  (in  terms  of 
changes  to  the  charge,  dipole  moment,  and  so  forth)  that  are  caused  by  a  change  in  the  electrostatic 
fields  (potential,  electric  field,  field  gradient,  etc.)  at  site  a'. 

If  sites  a  and  a'  are  the  same,  then  we  have  a  local  polarizability.  For  example,  the  polarizability 
describes  the  change  in  the  z  component  of  the  dipole  of  region  a  that  results  from  an  electric 
field  in  the  z  direction  at  site  a.  (Remember  that  the  moment  Qio  is  the  same  as  the  dipole  component 
/k;  see  eq.  (10).)  This  is  an  ordinary  local  dipole  polarizability.  However  the  induced  dipole  of  region 
a  changes  the  field  that  is  experienced  by  the  other  sites,  so  there  are  other  non-local  effects  of  the 
original  field  at  a,  and  these  are  described  by  the  non-local  polarizabilities  a““,  in  which  a  a'. 
Because  they  describe  secondary  polarizations  they  are  usually  smaller  than  the  local  polarizabilities. 


When  a  molecule  is  divided  into  several  regions,  it  becomes  possible  for  charge  to  flow  between 
the  regions  in  response  to  a  potential  difference  between  them.  This  means  that  there  are  charge-flow 
polarizabilities  a0“  (remember  that  multipole  Q0 o  is  the  charge)  that  describe  the  change  in  the 
charge  of  region  a  when  the  potential  at  site  a1  is  changed.  The  total  charge  on  the  molecule  must 
remain  constant,  so  the  total  change  in  charge  must  be  zero;  this  is  expressed  by  a  sum-rule  for  the 
charge-flow  polarizabilities: 


£ 


*001' 


=  0. 


a 


It  follows  that  the  non-local  charge-flow  polarizabilities  are  of  similar  magnitude  to  the  local  ones;  if 
the  charge  in  region  a  changes  by  a  certain  amount,  the  charges  on  the  neighbouring  sites  must  change 
by  amounts  of  similar  magnitude  for  the  total  charge  to  remain  constant. 


The  reason  for  using  distributed  polarizabilities  to  describe  the  distortion  of  the  charge  distribu¬ 
tion  is  the  same  as  the  reason  for  using  distributed  multipolcs  to  describe  the  charge  distribution  itself, 
it  is  a  matter  of  convergence  and  efficiency  of  description.  The  electrostatic  field  at  molecule  A  of  a 
neighbouring  molecule  D  is  strongly  non-uniform,  and  a  Taylor  expansion  about  the  centre  of  .1  will 
converge  very  slowly,  if  at  all,  at  points  far  removed  from  that  centre.  If  it  does  converge,  an  accurate 
description  of  the  field  requires  many  terms  in  the  Taylor  expansion,  for  which  high-order  derivatives 
of  the  potential  are  needed.  The  effect  of  these  high-order  derivatives  of  the  field  on  the  charge  dis¬ 
tribution  of  /I  is  described  by  high-rank  polarizabilities.  If  the  Taylor  expansion  of  the  electrostatic 
potential  of  D  does  not  converge  at  some  points  within  A,  then  no  single-centre  polarizability  descrip 
tion  of  the  distortion  is  possible.  If  on  the  other  hand,  we  study  each  small  region  of  A  separately,  the 
convergence  is  much  better,  and  the  description  of  the  distortion  can  be  achieved  by  using  distributed 
polarizabilities  of  relatively  low  rank.  The  inadequacy  of  the  single-centre  description  is  illustrated  by 
the  observation(45)that  even  with  distributed  polarizabilities  it  is  still  necessary  to  include  terms  up 
to  quadrupole  rank  at  least  to  obtain  a  satisfactory  description  of  the  induction  energy. 


Distributed  polarizabilities  are  also  needed  for  calculating  the  induced  moments  of  van  dcr  Waals 
complexes.  For  the  reasons  outlined  above,  single-centre  polarizabilities  are  not  satisfactory  for  cal 
culating  the  moments  induced  in  one  molecule  of  a  complex  by  its  neighbours,  and  some  apparent 
anomalies  were  found  when  such  calculations  were  attempted.  Induced  moments  calculated  using  dis¬ 
tributed  polarizabilities  show  much  more  satisfactory  agreement  with  experiment,  and  the  apparent 
anomalies  can  be  understood  in  terms  of  features  of  the  distributed  multipoles  or  distributed  polar- 
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izabilities  of  the  molecules  concemed[46).  A  simple  example  is  the  case  of  the  complexes  N2  •  •  -IIC1 
and  OC  •  •  ■  IIC1,  which  have  rather  different  induced  moments  although  the  polarizabilities  of  N2  and 
CO  are  almost  identical[47].  The  distributed  polarizabilities  show  that  the  C  atom  of  CO  is  more 
polarizable,  and  the  0  atom  less  polarizable,  than  the  N  atom  in  N2.  Since  the  C  is  closer  to  the  IIC1, 
it  is  in  a  stronger  field  than  is  the  0,  and  the  overall  induced  dipole  is  larger.  It  is  also  the  case  that 
the  atomic  dipole  on  the  C  atom  in  CO  is  larger  than  that  on  the  N  atoms  in  N2,  so  that  the  IIC1  is 
more  strongly  polarized  by  the  CO  than  by  the  N246. 

There  is  another  way  in  which  the  expression  (19)  might  be  improved  upon.  It  gives  the  induction 
energy  of  molecule  A  in  the  field  arising  from  the  multipole  moments  of  molecule  B.  However  molecule 
B  is  also  polarizable,  and  its  moments  will  be  modified  by  the  presence  of  molecule  A.  If  this  efTect  is 
taken  into  account,  we  arrive  at  an  expression  for  the  induction  energy  that  is  a  power  series  in  the 
molecular  polarizabilities[48].  In  practice,  the  effects  of  molecular  polarization  are  usually  calculated 
in  an  iterative  fashion;  the  polarized  moments  of  each  molecule  are  evaluated  in  the  field  due  to  the 
other  molecules,  and  the  calculation  is  repeated  until  the  polarized  moments  are  self-consistent.  This 
however  is  equivalent[48]  to  taking  some,  but  not  all,  of  the  terms  in  a  perturbation  series  to  infinite 
order,  and  moreover  it  is  known[49]  that  the  perturbation  series  for  the  induction  energy  is  asymptotic, 
i.e.  divergent.  This  means  that  the  conventional  iterative  procedure  is  highly  questionable,  and  indeed 
it  is  known  to  lead  to  singularities  at  short  range[50).  Numerical  investigation[45]  suggests  that  the 
simple  expression  (19)  is  more  satisfactory,  provided  that  distributed  polarizabilities  are  used  and 
provided  that  polarizabilities  up  to  at  least  quadrupole  rank  are  included. 

As  in  the  case  of  the  first-order  energy,  there  are  two  modifications  that  have  to  be  made  to  this 
description  when  the  wavefunctions  overlap.  In  the  first  place,  there  must  be  a  penetration  effect  that 
arises  from  the  overlap  of  the  charge  densities.  This  has  not  been  examined  in  detail,  but  it  appears  to 
be  small[45].  The  second  modification  arises  because  when  two  molecules  overlap,  it  becomes  possible 
for  electron  density  from  either  molecule  to  flow  onto  the  other.  This  effect  is  called  charge  transfer. 
In  perturbation  theory  terms,  it  can  be  described  by  excitations  from  the  occupied  orbitals  of  one 
molecule  to  the  virtual  orbitals  of  the  other.  As  such,  it  incorporates  not  only  the  genuine  physical 
effect  of  charge  transfer,  but  the  BSSE.  One  of  the  virtues  of  IMPT  is  that  the  effects  of  BSSE  do  not 
arise  in  most  of  the  energy  terms,  and  the  charge- transfer  interaction  is  the  only  term  involving  single 
excitations  in  which  they  do  occur.  It  is  possible  to  correct  the  calculated  charge- transfer  energy  for 
these  effects  by  a  procedure  similar  to  that  used  in  the  supermolecule  method,  but  as  in  that  case 
some  uncertainties  remain. 

There  is  a  further  problem  with  charge  transfer,  however.  If  we  could  perform  the  calculation  with 
a  complete  basis  set  on  molecule  A,  then  it  would  be  possible  to  describe  any  virtual  orbital  of  B  in 
terms  of  the  basis  set  for  A.  In  this  case,  the  charge  transfer  effects  would  be  included  completely  in  the 
induction  energy  for  molecule  A.  If  we  were  then  to  calculate  the  effects  of  excitations  from  occupied 
orbitals  of  A  to  virtual  orbitals  of  B  we  would  be  counting  the  same  effects  again.  In  other  words, 
the  charge-transfer  energy  is  formally  spurious.  In  practice,  the  basis  sets  used  today  are  too  small 
for  this  to  happen  to  any  groat  extent,  but  we  should  be  aware  that  in  principle  the  charge- transfer 
contribution  is  subject  to  this  kind  of  double-counting  error. 

The  dispersion  energy 

The  expression  (17)  for  the  dispersion  energy  can  also  be  replaced  by  a  more  accurate  expression  in 
terms  of  monomer  properties.  Eq.  (17)  involves  one-electron  excitations  on  both  molecules,  and  cannot 
appear  in  an  SCF  calculation  on  the  supersystem;  accordingly  the  dispersion  energy  is  a  manifesta 
tion  of  electron  correlation.  Nevertheless,  because  there  is  a  single  excitation  on  each  molecule,  the 
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dispersion  energy  can  be  reformulated  in  terms  of  monomer  polarizabilities,  which  are  one-electron 
properties  and  can  be  calculated  reasonably  accurately  at  the  SCF  level. 

The  key  is  the  replacement  of  the  energy  denominator  via  the  Casimir-Polder  identity  [51],  which 
can  be  established  by  a  simple  contour  integration: 


2_  =  1  / 
+  B  v  J0 


r0  (a*  +  «*k^  +  «*) 


du,  for  A  >  0,  B  >  0. 


In  eq.  (17)  we  put  A  =  W*  -  and  B  =  Wj*  —  W^,  and  then  the  use  of  eq.  (21), together  with  the 
expression  (6)  for  the  perturbation,  leads  to  the  following  expression  for  the  dispersion  energy: 


Ud isP  =  -5 -T?uTil  Jo  a^(in)abub‘(iu)du, 


where  the  polarizability  at  imaginary  frequency  ( iu )  is  given  by 

ao' , .  v  =  Y''  AH'm((0|g?lro)(m|Q?.,|0)  +  (0|Q?:|m)MQf[0)) 

tvK  ’  ~  (Aiym)2  +  hV  ’  1  ; 

with  Aiym  =  Wm  —  Wo.  This  is  a  much  more  tractable  and  useful  formulation  than  it  appears 
to  be  at  first  sight.  Once  again  the  multipole  moment  operators  Qf  are  referred  to  local  axes  in 
the  molecule,  so  the  polarizabilities  defined  by  (23)  are  also  referred  to  local  axes.  Consequently  no 
information  about  the  relative  orientation  of  the  molecules  is  required  to  evaluate  the  dispersion 
integrals  in  (22),  and  they  can  be  obtained  once  and  for  all  for  any  pair  of  molecules,  using  accurate 
coupled  Ilartree-Fock  calculations  on  the  monomer  to  obtain  the  polarizabilities.  The  need  to  evaluate 
polarizabilities  at  imaginary  frequency  is  not  a  problem;  they  can  be  calculated  just  as  easily  as  the 
static  polarizabilities.  Moreover  eq.  (23)  shows  that  a““,  (in)  is  very  well-behaved  as  a  function  of 
frequency  (it  tends  monotonically  to  zero  as  u  —>  co)  so  the  dispersion  integrals  can  be  evaluated 
accurately  by  numerical  quadrature  using  the  values  of  a““,  (in)  at  a  dozen  or  so  frequencies[44]. 

The  expression  (22)  for  the  dispersion  energy  is  rather  cumbersome,  since  it  involves  non-local 
polarizabilities  a““,  (in).  Many  of  these  are  small,  and  it  is  possible  to  transform  the  expression  for  the 
dispersion  energy  is  such  a  way  that  their  effects  are  represented  exactly,  at  sufficiently  large  distances, 
by  local  terms[52].  However  this  cannot  always  be  done  for  the  dispersion  integrals  involving  charge- 
flow  polarizabilities  on  one  or  both  molecules,  and  these  dispersion  integrals  contribute  significantly 
to  the  overall  dispersion  energy.  It  seems  likely  that  the  conventional  site-site  picture  of  the  dispersion 
interaction  is  invalid,  or  at  least  incomplete,  for  large  conjugated  molecules. 

It  should  eventually  be  possible  to  obtain  accurate  polarizabilities  and  dispersion  integrals  us¬ 
ing  correlated  wavefunctions.  Unfortunately  there  are  some  difficultics[53],  but  these  arc  being  over- 
comc[44]. 

Once  again  there  are  corrections  to  be  made  in  the  short-range  region.  Since  the  dispersion 
energy  is  part  of  the  correlation  energy  for  the  supersystem,  it  must  remain  finite  at  short  range,  while 
the  terms  in  the  multipole  expansion  (22)  diverge  like  some  power  of  l/R.  It  is  usual  to  multiply  the 
dispersion  expression  by  a  ‘damping  function’  to  cancel  this  singularity.  Several  authors  have  suggested 
suitable  damping  functions[54-56). 


SUMMARY 

The  calculation  of  accurate  intermolecular  potentials  is  a  difficult  task.  Supermolecule  calculations  are 
expensive,  and  are  subject  to  inaccuracies  that  are  difficult  to  avoid  or  correct.  An  approach  based  on 


perturbation  theory  principles  holds  some  promise.  The  intermolecular  potential  is  obtained  as  a  sum 
of  several  terms,  many  of  which  can  be  calculated  in  terms  of  properties  of  the  separate  molecules, 
which  can  be  obtained  much  more  accurately  than  is  possible  when  calculations  have  to  be  carried  out 
on  the  whole  complex.  We  are  led  in  this  way  to  seek  reliable  descriptions  of  the  charge  distribution  and 
polarizability  of  the  individual  molecules.  From  these  it  is  possible  to  construct  accurate  descriptions  of 
the  electrostatic,  induction  and  dispersion  energy,  leaving  only  the  short-range  terms  in  the  potential 
energy  to  be  modelled  by  more  empirical  methods. 

The  electrostatic  energy  is  obtained  in  terms  of  distributed  multipoles,  and  is  corrected  at  short 
range  for  the  effects  of  penetration.  The  multipole  moments  can  be  calculated  straightforwardly  from 
accurate  wavefunctions  for  the  individual  molecules.  It  is  necessary  to  use  multipole  moments  at  least 
as  high  as  quadrupole  if  multipole  sites  are  taken  on  each  atom.  This  means  that  the  expression  for 
the  electrostatic  energy  is  more  complicated  than  for  point  charges,  but  it  has  been  shown  that  it  can 
be  handled  efficiently  even  in  the  demanding  circumstances  of  molecular  dynamics  calculations[42), 
and  the  necessary  formulae  are  all  in  the  literature[9,7). 

The  induction  energy  is  expressed  in  terms  of  distributed  multipoles  and  distributed  polariz¬ 
abilities.  It  is  more  difficult  to  handle  than  the  electrostatic  interaction,  but  the  same  interaction 
functions  occur.  The  dispersion  energy  can  be  expressed  in  terms  of  polarizabilities  at  imaginary  fre¬ 
quency  for  the  individual  molecules,  and  the  use  of  distributed  polarizabilities  makes  it  possible  to 
describe  the  dispersion  interaction  accurately  at  short  range.  A  certain  amount  of  manipulation  is 
needed  to  transform  the  energy  expression  into  manageable  form,  but  when  this  is  done  the  resulting 
picture  is  very  similar  to  the  conventional  atom-atom  dispersion  models  that  are  widely  used,  except 
that  it  includes  additional  terms  that  arise  from  charge-flow  polarizabilities  in  the  distributed  picture. 
Damping  functions  must  be  included  to  correct  for  the  effects  of  overlap(54-56).  The  effects  of  electron 
correlation  should  be  included  in  the  polarizabilities;  this  is  currently  difficult  to  do,  but  the  principles 
arc  undcrstood[44]. 

The  techniques  are  therefore  all  available  for  a  new  generation  of  potentials  for  intermolecular 
and  intramolecular  interactions.  With  their  help  it  should  be  possible  to  model  such  interactions  in 
biological  systems  much  more  accurately  than  has  previously  been  possible. 
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DISCUSSION 


BUCKINGHAM  -  You  referred  to  the  need  for  damping  of  the  long-range  interactions  in 
the  region  of  overlap  of  the  electron  distributions  of  the  monomers.  Can  one  avoid  this 
by  a  suitable  choice  of  the  repulsive  potential,  thereby  reducing  the  number  of 
adjustable  parameters. 

STONE  -  If  the  dispersion  damping  were  to  be  described  by  adding  terms  to  the 
repulsive  potential,  they  would  have  to  behave  like  R'6,  R'8, ...  at  small  R  to  cancel  the 
singularities  in  the  long-range  dispersion  energy.  The  repulsion  energy  itself  does  not 
contain  terms  of  this  sort.  If  the  separation  R  does  not  become  too  small,  the  effects  of 
damping  could  probably  be  described  by  modifications  to  the  repulsive  potential. 
However  the  use  of  multiplicative  damping  factors  is  more  satisfactory,  and  the  form  of 
damping  function  has  been  investigated  in  some  detail  (see  ref.  55  of  my  paper). 


RIVAIL  -  You  mentioned  charge  flow  as  an  important  contribution  to  the  molecular 
polarization  phenomenon.  This  raises  a  very  difficult  problem  when  one  tries  to  define 
transferable  properties  because  the  molecule’s  electric  neutrality  may  be  violated.  Do 
you  think  that  there  is  a  hope  to  reasonably  account  for  this  effect  through  pheno¬ 
menological  dipole  or  multipole  distributed  polarizabilities  ? 

STONE  -  In  saturated  systems,  the  flow  of  charge  is  likely  to  be  a  short-range  effect, 
and  the  flow  of  charge  along  a  bond  can  be  represented  by  an  induced  dipole  (and 
higher  moments)  in  the  bond.  In  conjugated  systems,  however,  charge-flow  effects 
extend  over  greater  distances  and  cannot  so  easily  be  represented  by  local 
polarizabilities. 


GRESH  -  About  the  charge-transfer  term  :  using  the  Murrel-Randic-Williarns 
perturbation  treatment,  this  term  appears  as  a  distinct  term  in  its  own  right.  Using 
ab-initio  super  molecule  computations,  its  magnitude  was  often  overestimated 
because  of  basis  set  extension  effects.  Nevertheless  with  large  basis  sets,  and 
particularly  in  studies  of  cation-ligand  interactions,  this  term  comes  out  from  energy 
decomposition  computations,  with  radial  and  angular  dependences  of  its  own. 
Incorporating  explicity  Ect  in  our  S/BFA  procedure  by  means  of  a  development  into 
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analytical  formulas  based  on  the  MRW  formula,  we  were  able  to  account  for  such 
dependencies.  I  think  that  Eqj  ought  to  be  implemented  in  an  additive  systematics, 
with  care  concerning  its  calibration. 

About  the  short-range  repulsion  term  :  the  most  common  implementation  of  Erep  is 
under  the  form  of  atom-atom  terms.  It  was  proposed  by  Murrell  that  Erep  is  proportional 
to  the  square  of  the  overlap  integral  involving  the  localized  orbitals.  Representing  Erep 
under  the  form  of  bond-bond,  bond-lone  pair  and  lone  pair-lone  pair  interactions,  we 
could  reproduce  quite  well  the  radial  and  angular  evolution  of  the  corresponding  term 
in  the  first-order  term  of  the  ab-initio  supermolecule  computation. 

STONE  -  In  the  limit  of  large  basis  sets,  the  genuine  physical  phenomenon  of  charge 
transfer  is  completely  described  as  a  part  of  the  induction  energy.  In  the 
sum-over-states  formalism,  charge-transfer  is  described  by  excitations  from  occupied 
orbitals  of  one  molecule  (A,  say)  to  virtual  orbitals  of  the  other  (B),  while  induction 
energy  is  described  by  excitations  from  occupied  orbitals  of  A  to  virtual  orbitals  of  A.  If 
we  use  a  complete  basis  for  A,  any  virtual  orbital  of  B  can  be  expanded  in  terms  of 
orbitals  of  A,  and  the  A-»B*  excitations  duplicate  A-»A*  excitations.  The  'charge 
transfer'  energy  described  by  the  A-»B*  excitations  is  therefore  spurious.  In  a  smaller 
basis,  the  duplication  is  less  of  a  problem,  but  then  the  'charge-transfer'  terms  are 
heavily  contamined  by  basis  set  superposition  error.  It  follows  that  the  inclusion  of 
'charge  transfer’  in  A->B*  excitations  is  a  very  treacherous  procedure.  In  any  case,  it 
has  become  clear  in  recent  years  that  many  phenomena  attributed  to  charge  transfer 
effects  are  well  accounted  for  by  electrostatic  interactions,  provided  that  the 
electrostatic  effects  are  accurately  described.  The  relationship  between  the  overlap 
and  the  repulsion  energy  has  been  noticed  by  several  workers.  However  no  simple 
formula  representing  repulsion  in  terms  of  overlap  has  been  derived  from  first 
principles,  nor  is  it  likely  to  exist.  Moreover  the  overlap  formulation  is  inadequate  as  an 
accurate  route  to  the  interaction  energy,  and  is  too  complicated  to  use  as  part  of  a 
simple  model. 


SMITH  -  In  your  dipeptide  calculations  you  noticed  an  asymmetry  in  the  methyl 
group(s).  This  interesting  phenomenon  is  seen  in  many  classes  of  organic  molecules. 
Your  coworkers  attributed  it  to  polarisation  effects.  What  was  the  evidence  for  this  ? 


STONE  -  The  asymmetry  noticed  by  Faerman  &  Price  is  an  asymmetry  of  the  charge 
distribution,  rather  than  the  geometrical  asymmetry  that  Dr.  Smith  has  in  mind.  For 
instance,  in  studies  where  the  methyl  hydrogen  atoms  were  allowed  to  carry  charges, 
thoses  charges  were  not  equal.  It  is  thought  that  the  inequalities  are  a  consequence  of 
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polarization  in  the  electric  fields  of  neighbouring  atoms,  but  at  present  this  is  only  a 
conjecture. 


PEPE  -  The  long  range  interactions  play  an  important  role  in  large  systems,  do  you 
think  that  your  method  is  accurate  enough  to  get  good  results  on  large  systems  ? 

STONE  -  Repulsions  decay  like  e-otR  and  dispersion  like  R-6,  but  electrostatic 
interaction  between  neutral  polar  molecules  behave  like  R*3  and  interactions  between 
charges  like  R'1.  Consequently  it  is  essential  to  use  accurate  charge  distributions  to 
get  the  long-range  interactions  right. 


PERAHIA  -  How  good  is  the  transferability  of  multipoles  from  one  conformation  to 
another  ? 

STONE  -  This  is  discussed  in  detail  in  the  paper  by  Faerman  &  Price  (ref.  26  of  my 
paper). 


ANGYAN  -  The  idea  of  introducing  a  molecular  "capacitance  matrix”  to  describe 
intramolecular  charge-flow  effects  seems  to  be  very  promising.  Can  you  illustrate  these 
charge-charge  polarizability  matrices  for  the  case  of  simple  polyatomic  molecules  (like 
H20)  or  for  aromatic  systems  ? 

STONE  -  We  have  some  results  for  benzene  and  naphtalene,  but  they  are  not  yet 
ready  for  publication. 
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SUMMARY 

It  is  presented  how  an  accurate  and  reliable  molecular  force  field  for  proteins  , 
nucleid  acids  and  saccharides  can  be  deduced  from  vibrational  spectroscopies.  A 
critical  comparison  is  made  with  the  more  commonly  used  force  fields  . 

INTRODUCTION 

The  knowledge  of  the  three  dimensional  structure  of  molecules  and 
macromolecules  is  of  primordial  importance.  The  molecular  modeling  techniques 
take  a  great  part  in  chemistry-  based  disciplines  such  as  crystallography,  organic 
chemistry, inorganic  chemistry,  theoretical  chemistry,  polymer  chemistry, 
medicinal  chemistry,  biochemistry,  spectroscopy,  enzymology,  pharmacology  and 
so  on.  There  exists  also  a  number  of  experimental  techniques  that  give  three 
dimensional  pictures  of  molecules  among  which  crystal  structure  analysis  is  the 
one  that  provides  the  most  accurate  structural  information  about  biomolecules. 

Computational  chemistry  has  made  larger  advances  since  the  advent  of  modem 
digital  computers.  Color  graphics  computer  technology  is  making  it  increasingly 
possible  to  visualize  and  manipulate  molecules  with  thousands  of  atoms.The 
computational  methods  leading  to  molecular  structures,  energies  and  properties 
(  namely  empirical,  semi-empirical  and  ab  initio  calculations )  are  based  on  the 
evaluation  of  the  molecular  potential  energy  surface.  Quantum  mechanical 
calculations  have  the  capabilities  for  being  the  most  accurate  technique  but  they 
require  a  great  deal  of  computer  time.  Molecular  mechanics  use  an  analytical 
expression  for  the  potential  energy  function  based  on  empirical  force  constants  or 
parameters.  Once  the  potential  energy  function  is  known  ,  the  equilibrium 
structure  of  a  molecule  is  obtained  through  minimization  techniques.  Molecular 
dynamics  may  also  be  performed  in  order  to  let  the  molecules  reach  the  global 
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minimum  energy  conformation  instead  of  a  local  one  as  obtained  through  energy 
minimization  procedures  and  to  predict  their  dynamical  behaviour. 

One  of  the  major  limitations  of  molecular  mechanics  and  molecular  dynamics 
techniques  is  the  reliability  of  the  force  field  itself.  It  is  the  aim  of  this  paper  to 
show  how  accurate  force  field  may  be  determined  from  vibrational  spectroscopy 
in  order  to  improve  the  predictive  capabilities  of  the  molecular  modeling 
methods. 

THE  MOLECULAR  FORCE  FIELD:  FROM  THE  UREY-BRADLEY- 
SHIMANOUCHI  MODEL  TO  THE  LOCAL  SYMMETRY  FORCE  FIELD. 

A  molecule  is  an  ensemble  of  atomic  nuclei  with  surrounding  electrons.  This 
ensemble  is  interacting  with  other  molecules  through  attractive  and  repulsive 
phenomena.  In  the  Born  Oppenheimer  approximation  electrons  are  making 
effective  responses  to  variations  in  nuclear  positions.  In  a  stable  electronic  state 
the  potential  energy  V  of  a  molecule  is  expressed  as  a  function  of  nuclear 

positions  :V  =  V(xi,yi,zi  ;  x2,y2>z2  ; . ;  xn,yn,zn )  where  xi,yi  and  zj  are 

the  cartesian  coordinates  of  the  i*h  atom  and  n  the  number  of  nuclei  in  the 
molecule. 

The  analytical  expression  of  the  potential  energy  V  of  a  n  atomic  molecule  is 
not  known.  The  only  certainty  is  that  its  value  only  depends  on  the  relative 
position  of  the  nuclei.  It  does  not  change  with  the  rotation  nor  the  translation  of 
the  molecule  as  a  whole.  For  small  displacements  ,the  potential  energy  function  is 
usually  expanded  in  a  Taylor  series  in  terms  of  p  variables  qp  where  p  is  the 
degree  of  internal  freedom: 

v-  v0  +  X  fj  q,  +2fX  fy  q5  qj  +37  £  fijk  <ij  %  +  . 

i  ij  ijk 


2  3 

5V  8  V  6  V 

fi  =  (— >0  >  fij  =  fc-T-) 0  and  ‘ijk  =  (rr-r^ 
6q;  SqjSqj  oq^q^ 


are  called  the  linear ,  quadratic  and  cubic  force  constants ,  respectively.As  we  are 
interested  in  variations  of  the  potential  energy  with  the  displacement  coordinates  , 
its  origin  may  be  choosen  as  Vo  =  0  .Then  the  equilibrium  configuration  of  the 
molecule  must  be  at  a  minimum  of  the  potential  energy  and  requires  that  all  the 
qj's  are  independent  and  the  linear  force  constants  fi  are  equal  to  zero.  Again  for 
small  displacements  of  the  nuclei  around  their  equilibrium  position  ,  we  can 
neglect  the  cubic  and  higher  terms  in  the  Taylor  expression  of  V  so  that : 

v  =  y?fijqiqj 

u 

The  set  of  fjj  matrix  elements  constitutes  the  General  force  field.  The  more 
commonly  used  molecular  mechanics  computer  programs  are  based  on  a  potential 
energy  function  expressed  in  terms  of  internal  coordinates  (  bond  stretching  , 
bond  angle  bending  and  torsional  coordinates  )  and  deal  with  a  number  of  internal 
coordinates  greater  than  the  degree  of  vibrational  freedom  p  =  3n-6  .  The 
problem  of  how  to  choose  these  p  basis  coordinates  has  been  developed  in  detail 
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by  T.  Shimanouchi  (ref.l).  A  number  of  model  force  fields  has  been  presented  in 
the  litterature  among  which  the  Urey-Bradley-Shimanouchi  force  field  (UBSFF) 
has  been  very  useful  in  explaining  the  vibrational  spectra  of  many  complex 
molecules.  In  this  model  it  is  assumed  that  the  off-diagonal  elements  of  the  F 
matrix  (  composed  of  the  fij  elements  )  are  due  to  the  interaction  between 
nonbonded  atoms. 

In  describing  the  potential  energy  function  ,  the  fundamental  differences  with 
the  more  commonly  expressions  (e.g.  those  of  CHARMM  (ref.2)  or  AMBER 
(ref.3)  computer  programs  )  are  highlighted. The  Urey-Bradley-Shimanouchi 
potential  energy  function  is  composed  of  terms  representing  variations  of  bond 
stretchings  ,  bond  angle  bendings,  out-of-plane  bendings  and  dihedral  torsions  to 
which  Van  der  Waals  interactions,  electrostatic  interactions,  hydrogen-bonds  are 
added. 


V(r)  =  1/2  I  Kb ( r8 -  r0 )2  +  1/2  X  H0 r„ rjk  (  0  -B  )  +  1/2  IY  ( A*  ): 

bonds  angles  opb 

+  1/2  X  Yjj  ( Ax.. )2  +  X  Aexp(-Br)  -  C/r5  +  £  — 

torsions  '■*  nonbonded  nonbonded  r 

pairs  r  <  6  A  pairs  r  <  6  A 

+  1/2  X  Fij(Qij-q0)2  +  X  Fy  qjj ( «iij - q0 ) 

1,3  ditanccs  1.3  distances 

+  1/2  X  -4  (0ijk-  eo)2  +  X  -4  (e.*-ej(0Sll-0ft)  +  X  B, 


tetrahedral  ^ g 
angles 


tetrahedral  ^/~2. 
angles 


ijk  vo t 


H-bonds 


-hb 


where  rjj  =  the  bond  length  between  atoms  i  and  j ,  0ijk  =  the  bond  angle  between 
atoms  i,j,k  ,  qij  the  1,3  distance  between  nonbonded  atoms  i  and  j  ,  AFlijkl  =  the 
out-of-plane  bending  (opb)  coordinate  defined  by  Ay  sin  0jik  ( Y  being  the  angle 
betwwen  the  bond  i-1  and  the  plane  jik  ,  AtJJ  the  internal  rotation  coordinate 
around  the  bond  i-j  ,  Kb  the  bond-stretching  force  constant  ,  H0,F  and  F  the 
attractive  bond  angle  force  constant ,  the  repulsive  bond  angle  force  constant  and 
the  geminal  force  constant ,  respectively  ,  k  the  Kappa  (internal  tension)  force 
constant  ,yand  Y  force  constants  associated  with  the  opb  coordinate  and  the 
torsional  coordinate  ,  ro  ,  0o  and  qo  are  equilibrium  parameters. 

Inspection  of  this  equation  reveals  that  it  is  a  rather  complicated  potential 
energy  function.  Slight  differences  with  CHARMM  and  AMBER  appear  in  the 
definition  of  the  out-of-plane  bending  potential  (sinus  term)(ref.4),  the  dihedral 
torsional  potential  (harmonic  approximation)  and  Van  der  Waals  interactions  (  we 
use  a  Buckingham  potential  instead  of  a  Lennard  Jones  potential ).  More  striking 
divergences  occur  ,  however  ,  for  bond  angles  and  hydrogen  bonds.The  term 
representating  bond  angles  contains  the  attractive  expression  but  also  changes  in 
the  distance  between  the  first  and  the  third  atom  of  an  angle.,  the  latter 
contribution  being  neglected  in  CHARMM  and  AMBER.  This  definition  is  by  no 
means  subtle.  It  should  be  emphasized  that  vibration  frequencies  of  poiyatomic 
molecule  cannot  be  calculated  satisfactoi  without  taking  account  for  the  atomic 
repulsion.  The  F  term  (  F  =  -  0.1  F  for  a  r9  repulsion)  does  not  have  any  effect 
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on  vibration  frequencies  but  is  required  in  order  that  all  the  internal  coordinate 
displacements  ,  i.e.  the  q's  are  zero  and  the  energy  is  a  minimum  for  qj  =  0  , 
i=l,2,..,3N-6.  But  this  argument  is  only  valid  if  the  coordinates  are  independent. 
If  redundant  coordinates  are  included  (  definition  of  six  bond  angle  coordinates 
for  the  Methane  molecule  whereas  there  exists  only  five  corresponding  vibration 
frequencies  ),  the  argument  must  be  revised  by  using  the  explicit  redundancy 
condition  given  to  the  second  order ,  and  Kappa  terms  must  be  added.  It  should  be 
emphasized  that  neither  CHARMM  nor  AMBER  takes  into  account  the 
redundancy  condition.The  Kappa  terms  defined  in  the  empirical  potential  energy 
function  correspond  to  the  case  of  a  perfect  tetrahedral. 

The  hydrogen  bonding  potential  energy  function  (ref.5)  is  a  combination  of 
Van  der  Waals  interactions  and  an  explicit  harmonic  function,  composed  of  terms 
representing  bond  length  ,  bond  angles  and  torsions  ,  viz  for  a  peptide  hydrogen  - 
bonding: 

Vhb  =  X  A  exp(  -Br )  -  C/r6  +  1/2  K0  H  (Ar  (O..H))2  +  1/2  Har(C=0)  r(H..O)  (Aa)2 

nonbondcd 

pairs  r  <  6  A 

+  1/2  Hp  r(N-H)  r(H..O)  (Ap)2  +  1/2  Y  ( At  )2  +  1/2  Y'(Ax')2  +  l/2Y"(At")2 

where  a  and  p  are  the  C=O..H  and  N-H..0  bond  angles  ,  respectively  ,  Ax ,  Ax  ’ 
and  Ax"  the  internal  rotation  displacements  about  the  C=0  ,  O..H  and  N-H  bonds 
respectively.  The  energy  is  essentially  dominated  by  the  bond-stretching  term.The 
parametrization  of  CHARMM  and  AMBER  is  different.  The  hydrogen  bond 
energy  is  dominated  by  its  electrostatic  interactions.  This  representation  yields 
potential  surfaces  in  overall  good  agreement  with  scaled  ab  initio  surfaces  but  is 
confronted  to  the  reproduction  of  hydrogen-bonding  vibrations  in  the  liquid  state. 
Therefore  we  do  think  that  the  best  potential  energy  function  should  include  an 
explicit  bond  stretching  term  ,  Van  der  Waals  and  electrostatic  interactions. 

The  use  of  the  harmonic  potential  energy  function  in  the  equations  of  motion 
leads  to  solutions  that  give  harmonic  molecular  vibrations.  The  corresponding 
frequencies  may  be  compared  with  experimental  data  obtained  in  the  Infrared  , 
Raman  and  Resonance  Raman  spectra.  The  force  constants  or  parameters  used  in 
molecular  mechanics  studies  are  commonly  optimized  to  get  the  best  fit  of 
calculated  and  experimental  properties  such  as  geometries  .conformational 
energies  and  heats  of  formation.  Very  recently  ,it  was  rediscovered  that 
vibrational  spectroscopy  can  be  used  with  great  accuracy  to  derive  force  field 
parameters(refs.6,7).  Parameters  are  most  closely  associated  with  the  vibrational 
frequencies  and  their  isotope  effects.  Moreover  Coriolis  coupling  constants, 
centrifugal  distorsion  constants,  mean  amplitudes  of  vibration  and  rotation- 
vibration  coupling  constants  also  give  information  on  force  field  parameters. 
When  we  are  dealing  for  example  with  the  Cl ,  Br  and  I  heavy  atom  vibrations, 
the  Urey-Bradley-Shimanouchi  force  field  is  a  good  approximation.  For 
hydrogen  atom  vibrations ,  this  force  field  often  gives  frequencies  which  are  far 
from  the  observed  ones  and  for  example  a  stretching-stretching  interaction 
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constant  has  to  be  introduced  in  order  to  explain  the  high  frequency  spectra  of 
methyl  or  methylene  groups.  In  the  same  way  a  bending-bending  interaction 
constant  is  needed  in  order  to  assign  the  mid  frequency  spectral  range. 
Introducing  such  parameters  leads  to  the  so-called  Modified  Urey-Bradley- 
Shimanouchi  force  field  (  MUBFF).  MUBFF  is  generally  close  to  the  General 
force  field.  For  CH3-X  group  the  MUBFF  consists  of  nine  force  constants  : 
K(CH)  ,  K(CX),H(HCH),  H(HCX),F(HCH),F(HCX),  internal  tension,  the 
stretching-stretching  constant,  and  the  bending-bending  constant  which  are 
obtained  from  nine  experimental  frequencies.  But  the  nine  normal  vibrations  of 
CH3-X  are  usually  called  the  CH3  totally  symmetric  stretching  (TS)  ,  the  CH3 
symmetric  deformation  (SD),  the  CH3  degenerate  deformation  (  DD,DD’) ,  the 
CH3  degenerate  rocking  (DR, DR')  and  the  CX  stretching  vibrations. 

The  suitable  choice  of  basis  coordinates  in  the  expansion  of  V  should  correspond 
to  these  modes.  The  force  constants  on  the  basis  of  symmetry  coordinates  ( local 
symmetry  force  field  LSFF  or  group  coordinate  force  field  GCFF  )  are  different 
from  each  other  as  shown  on  Table  1  where  the  MUBFF  ,  the  LSFF  and  their 
relationship  together  with  the  definition  of  coordinates  are  given. 

TABLE  1 

Force  constants  in  the  local  symmetry  force  field. 

Starting  point :  Urey-Bradley-Shimanouchi  force  field 

stretchings(  mdyne/A) 

K(CC)  =  2.563  K(CH)  =  4.301  CH3  group 
K(CH)  =  3.936  CH2  group 
attractive  and  repulsive  bond-angle  bendings.  ( mdyne/A) 

H(CCC)  =  0.287  F(CCC)  =  0.369 

H(HCH)  =  0.332  F(HCH)  =  0.279  CH2  group 

H(HCC)  =  0.191  F(HCC)  =  0.537  CH2  group 

H(HCH)  =  0.378  F(HCH)=  0.195  CH3  group 

H(HCC)  =  0.208  F(HCC)  =  0.385  CH3  group 

intramolecular  tensions 

k(CH3)  =  0.025  k(CH2)  =  0.054 

interaction  between  CH2  bendings  ( mdyne.A) 

X-CH2-Y  group 

2V  =  L  (XCH)  (YCH')  -  L  (XCH)  (XCH') ,  with  L  =0.01  Imdyne.A 
u..  a.  Hi  H  H. 


5  atoms,  6  bending  coordinates,  4  stretching  coordinates  ==>  9  independant 
coordinates 

Definition  of  the  symmetrical  coordinates  ( ip  means  in  plane  and  op  out  of  plane) 
Redundancy:  S  =  1/V  6  ( al2  +  al 3  +  a23  +  blx  +  b2x  +  b3x  ) 

TS  =  1/f" 3  ( rl  +  r2  +  r3  )  ip 

DS  =  l/^T 6  (  2rl  -  r2  -  r3)  ip  DS‘  =  1/yf  2  ( r2  -  r3  )  op 


'  -  - 
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SD  =  ll4~ i 6  (  al2  +  al3  +  a23  -  blx  -  b2x  -  b3x  )  ip 
DD  =  lAf  6  ( 2al2  -  al3  -  a23  )  ip  DD' =  lAf 2(al2-al3)  op 

DR  =  1  AT 6  ( 2blx  -  b2x  -  b3x )  ip  DR'  =  1A/- 2  ( b2x  -  b3x )  op 

CX  stretching 

Redundancy:  S  =  lA/~ 6  (  a  +  blx  +  b2y  +  b2x  +  bly  +  XCY  ) 

SS  =  lAf 2  ( rl  +  r2 )  AS  =  \ht\ 2  (  rl  -  r2  ) 

SC  =  1/V  20  ( 4a  -  blx  -  bly  -b2x  -  b2y  )  ip 
WA  =  l/-\f~ 6  (blx  +  b2x  -  bly  -  b2y  )  ip 

TW  =  lAf  : 2  ( blx  -  b2x  -  bly  +  b2y  )  op 

RO  =  l/\(2)  ( blx  -  b2x  +  bly  -  b2y  )  op 
CX  stretching  CY  stretching  XCY  bending 

the  local  symmetry  force  field 
For  a  Methyl  group 

F(SD)  =  0.550  H(HCH)  +  0.238  F(HCH)  +  0.839  H  (CCH) 

+  0.322  F(CCH)  +  0.530  k(CH3)  =  0.584  mdyne.  A 

F(DD)=  1.188  H(HCH)  +0.475  F(HCH)  -0.177  k(CH3)=  0.534  mdyne.A 
F(DR)  =  1.679  H(CCH)  +  0.645  F(CCH)  -0.177  k(CH3)  =  0.593  mdyne.A 

%  For  a  Methylene  group 

F(SC)  =  0.950  H(HCH)  + 0.380  F(HCH)  +  0.336  H(CCH) 

+  0.129  F(CCH)- 0.318  kCH2)=  0.538  mdyne.  A 
F(WA)  =  1.679  H(CCH)  +  0.645  F(CCH)  +  0.530  k(CH2)  -2L  =  0.674  mdyne.A 
F(RO)  =  1.679  H(CCH)  +  0.645  F(CCH)  +  0.530  k(CH2)  +2L  =  0.717  mdyne.A 
F(TW)  =  1.679  H(CCH)  + 0.645  F(CCH)  + 0.884  k(CH2)  =  0.619  mdyne.A 
F(SS)  =  4.735  mdyne/A  F(AS)  =  4.576  mdyne/A 

The  force  constants  of  functional  groups  of  organic  molecules  are  usually 
influenced  by  the  nearest  neighbouring  groups.  They  may  be  transferred  to 
molecules  having  the  same  groups.For  ring  molecules  ,  the  redundancy  conditions 
become  complex.  However  it  is  always  possible  to  define  local  symmetry 
coordinates  of  rings.  A  nice  example  is  given  by  the  benzene  molecule  for  which 
UBSFF  has  shown  strong  limitations.  Table  2  gives  LSFF  parameters  as  compared 
with  published  data. 

TABLE  2 

In  plane  vibrations  of  benzene 


force  constant 

this  work 

MP2/6-31G* 

(mdyne/A) 

calculated 

frequencies 

(cm-1) 

(ref.8) 

Alg  FI  1 

993 

7.6385 

7.638 

FI  2 

0. 

0.135 

F22 

3063.6 

5.0912 

5.625 

A2g 

Fll 

1366.4 

0.5151 

0.903 

Blu 

Fll 

1010. 

0.8571 

0.639 

F12 

0. 

-0.163 

F22 

3057.1 

5.0674 

5.562 

B2u 

Fll 

1309.4 

4.0673 

4.049 

F12 

-0.25 

0.340 

F22 

1146. 

0.4710 

0.864 

E2g 

Fll 

1178.3 

0.54 

0.838 

F12 

0. 

-0.106 

FI  3 

-0.45 

0.292 

F14 

-0.153 

-0.139 

F22 

3055.7 

5.077 

5.571 

F23 

0. 

0.076 

F24 

0. 

0.029 

F33 

1599.5 

6.6546 

6.938 

F34 

-0.63 

-0.449 

F44 

606.9 

0.933 

0.942 

Elu 

Fll 

1035. 

5.356 

7.568 

F12 

-0.18 

0.249 

F13 

0. 

0.176 

F22 

1479.5 

0.5572 

0.973 

F23 

0. 

0.008 

F33 

3064 

5.1052 

5.60 

Out  of  plane  vibrations  of  benzene 

Force  constants  and  frequencies 

B2g 

Fll 

996.9 

0.3845 

F12 

-0.086 

F22 

701.2 

0.3595 

E2u 

Fll 

969.6 

0.355 

F12 

-0.095 

F22 

401.7 

0.367 

Elg 

Fll 

843.4 

0.5234 

A2u 

Fll 

667.1 

0.382 
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DERIVING  EMPIRICAL  POTENTIAL  FUNCTION  PARAMETERS  FROM 
VIBRATIONAL  SPECTROSCOPY. 

There  are  several  difficulties  in  fitting  the  vibrational  spectra.  If  nexp 
vibrational  frequencies  have  been  obtained,  we  are  able  to  calculate  ncal  normal 
modes  frequencies.  In  principle  it  can  be  shown  that  there  are  2n  equally  good 
force  fields  giving  2n  sets  of  calculated  frequencies  with  completely  different 
assignments.  Furthermore  if  N  is  the  number  of  atoms  in  the  molecule,  the 
number  of  force  constants  which  may  be  varied  to  improve  the  calculated 
frequencies  increases  as  (3N-6)(3N-5)/2  while  the  number  of  observed 
frequencies  is  only  3N-6.There  are  several  ways  to  overcome  these  difficulties. 
They  will  be  described  in  the  following  paragraphs  where  results  obtained  for 
different  classes  of  molecules  with  biological  importance  will  be  presented. 

A  vibrational  force  field  for  peptides  and  proteins  .  and  the  use  of  the 
transferability  of  force  constants,  isotopic  substitutions  .  mean-square 
displacements  of  atomic  distances  and  crystallographic  temperature  factors  as 
reliability  criteria.(ref.9I 

In  order  to  setup  parameters  of  the  potential  energy  function  for  proteins  ,  we 
have  performed  normal  modes  calculations  of  a  series  of  small  molecules  in  the 
crystalline  state.  The  following  molecules  ,  i.e.  Urea  (ref.10),  N-Methyl- 
Acetamide  (NMA)(ref.ll),  N-Acetyl-L-X-N-Methylamide  (  X=  Ala  ,  Phe 
)(ref.l2)  have  been  studied  whereas  the  molecules  with  X=  Pro,  Trp,  Tyr  and  His 
are  now  in  progress. 

It  is  generally  assumes  that  the  principle  of  the  transferability  of  force 
constants  for  a  same  series  of  molecules  is  applicable.  Table  3  gives  the  calculated 
Amide  vibration  frequencies  of  the  N-Acetyl-L-Alanine-Methylamide  molecule 
from  the  molecular  force  field  of  the  NMA  molecule. 


TABLE  3 


s.  RAMAN 

obs.  I.R. 

calculated 

Assignments 

freq.(cm-l) 

freq. 

freq. 

1667 

1670 

1656 

AMIDE  I  C=0  st. 

1653 

1639 

1620 

1565 

AMIDE  II  N-Hipb. 

1570 

1548 

1325 

1320 

1333 

AMIDE  III  C-N  st,  C-C  st 

1300 

1303 

855 

850 

AMIDE  V  N-H  opb 

768 

743 

696 

695 

706 

AMIDE  IV  C=0  ipb 

640 

628 

593 

617 

AMIDE  VI  C=0  opb 

522 

520 

572 

where  st=  stretching  ,  ipb=  in-plane  bending  ,  opb  =  out-of-plane  bending 
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Inspection  of  this  table  reveals  that  the  principle  is  valid  for  most  vibrations  , 
exception  of  the  C=0  out-of -plane  bending  vibration  ,  C=0  group  being  near  the 
side-chain.  Accordingly  specific  interactions  between  this  group  and  the  side- 
chain  are  required  in  order  to  get  better  agreement.  For  all  molecules  the  standart 
deviation  of  the  order  of  5cm-l  between  the  observed  and  computed  frequencies 
is  achieved.  In  addition  to  the  frequency  criterion  ,  additional  criteria  have  been 
used. 

First  in  order  to  ensure  the  adequateness  of  the  potential  energy  distribution 
for  each  vibration  ,  the  effects  of  isotopic  substitutions  are  examined.  Table  4 
compares  some  observed  characteristic  frequencies  of  urea  d-4  molecule  with  the 
calculated  vibrations. 


Obs.  freq. 

calc,  freq 

2597 

2583 

2438 

2419 

888 

887 

375 

375 

TABLE  4 
Assignments. 

ND2  asymmetric  stretching 
ND2  symmetric  stretching 
ND2  rocking 
N-D  out-of-plane  bending 


Then  the  mean-square  displacements  of  atomic  distances  can  be  used.  Table  5 
points  out  the  good  agreement  between  the  observed  and  the  theoretical  value  for 
tile  N-I-I  bond  stretching  coordinate  displacement. 

TABLE  5 

Electron  diffraction  calculated 

value  value 

0.0053  0.0051  ( A2) 

The  temperature  factors  can  be  also  of  great  help  when  low  frequency  range 
spectra  do  not  reveal  all  the  lattice  vibrations  and  all  the  internal  vibrations 
associated  with  torsional  coordinates.  Inspection  of  the  Table  6  shows  a  rather 
good  agreement  between  observed  and  computed  Debye-Waller  factors  for  some 
atoms  of  N-Acetyl-L-Alanine-Methylamide.  C1H3CO1N  iHCH(CH3)CONHCH3. 

TABLE  6 


Atom-type 

Observed 

calculated 

values 

values  (A2) 

Ci 

4.6  (1.0) 

6.7 

C(= O) 

7.2  (1.2) 

5.4 

Ni 

4.8  (0.8) 

5.2 

Oi 

6.5  (0.8) 

7.7 

Finally  it  should  be  emphasized  that  the  calculation  of  IR  and  neutronic 
vibrational  intensities  are  in  progress.The  fundamental  conclusion  of  these 
different  studies  is  that  the  set  of  parameters  can  be  quite  different  from  those 
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utilized  by  CHARMM  or  AMBER.Table  7  gives  an  order  of  idea  of  the  difference 
between  some  bond-stretchings  force  constants  of  NMA  and  the  energy  barriers 
to  internal  rotations  about  the  bonds  of  NMA. 

TABLE  7 


our  results 

CHARMM 

Kb  ( Kcal/mol  A^) 

AMBER 

C-C 

155 

187 

335 

N-CT 

316 

261 

E  (Kcal/mol) 

355 

N-C 

17.3 

20. 

20. 

C-C 

1.1 

0. 

0. 

N-CT 

0.18 

0. 

0. 

So  as  to  assess  the  importance  of  different  potential  energy  functions  on  the 
calculation  of  dynamic  properties  of  proteins  ,  a  harmonic  dynamics  simulation  of 
BPTI  (  Bovine  Pancreatic  Trypsin  Inhibitor  )  was  carried  out  by  using  the  Urey- 
Bradley-Shimanouchi  force  field. (ref. 9). The  density  of  vibrational  states  in  the 
region  0  to  200  cm-1  and  the  atomic  fluctuations  are  different  from  the  results 
obtained  by  previous  harmonic  dynamics  simulations  using  a  different  empirical 
potential  energy  function.There  can  exist  an  important  deviation  for  the  lowest 
frequency  .mode,  as  it  can  be  seen  Table  8. 

TABLE  8 

Authors  Number  of  degrees  frequency  (cm-1) 


of  freedom 

this  work 

1040 

15.2 

Brooks  (ref.  13  ) 

1740 

3.1 

Cusack  et  al.(ref.l4) 

1740 

11.0 

Cusack  et  al  (ref.  14) 

2712 

9.6 

Examination  of  the  rms  fluctuations  of  the  Lys-15  residue  reveals  also  a  different 
dynamical  behaviour(  Table  9).This  residue  is  of  particular  importance  because 
there  is  a  conformational  transition  around  the  CP-Cy  bond  with  the  fixation  of 
the  Inhibitor  to  the  enzyme  molecule. Accordingly  the  understanding  of  the 
mechanism  of  molecular  recognition  between  two  molecular  entities  depends  upon 
the  potential  energy  function 

TABLE  9 

Atom-type  averaged  rms  fluctuations  of  Lys-15  residue.!  A) 

Authors  Type 

Main-chain 

this  work  0.43 

Brooks  (ref.  13))  0.7 


Side-chain 

0.5 

1.5 
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Validation  of  force  field  using  UV  Resonance  Raman  Intensities.  Application  to 
Nucleic  Acid  Bases. 

_  When  absorption  occurs  during  a  n — >n*  excitation,  changes  in  the  electronic 
distributions  lead  to  a  modification  of  the  equilibrium  position  of  the  atoms  in  the 
excited  state.  The  equilibrium  position  of  atoms  in  the  excited  state  can  then  be 
expressed  by  a  vector  Xe  {Xit X3n)  whose  elements  are  functions  of  the 
ground  state  cartesian  displacement  coordinates  and  Xe  represents  the 
displacement  of  the  equilibrium  atomic  coordinates  of  the  electronic  excited  state 
from  the  equilibrium  atomic  coordinates  of  the  ground  state.. 

In  another  way  we  define  Re  as  the  set  of  internal  valence  bond  displacement 
coordinates  between  the  ground  and  the  excited  state  {  Rei,  ...,  Re3n-6)>  Ae 
{Aei,.~,Ae3n-6}  is  the  excited  state  displacements  of  the  normal  coordinates  and  is 
related  to  Re  by  Re=  L.Ae  ,  L  being  the  matrix  of  the  ground  state  vibrational 
eigenvectors.  L={Lii,.,  Lij}  is  the  participation  of  each  internal  coordinate  Ri,. 
Rk  to  the  normal  mode.  The  inverse  transformation  leads  to  Ae  =  L-URe  =  L* 
i.B.Xe  .  Using  ab  initio  quantum  mechanical  or  semi-empirical  calculations,  it  is 
possible  to  calculate  the  cartesian  coordinates  of  the  minimum  of  potential  energy 
of  the  excited  and  ground  electronic  states.  From  normal  mode  calculations,  we 
can  obtain  the  L,  and  L'1  matrices  However,  these  minimizations  are  time 
computer  expensive.  In  the  case  where  the  displacement  of  the  minima  of 
potential  energy  between  the  ground  state  and  the  excited  state  is  weak  (  rigid 
molecule  and  first  electronic  transitions),  there  is  a  linear  relationship  between  the 
change  of  bond  order  of  C-C  and  C--N  bonds  and  the  change  in  their  bond  length 
(ref.l5).Then  we  have  Ae  =  L-J.Re  =  L*1.  (Ab.o)e  where  Ab.o  means  the  change 
in  bond  order,  and  Re  the  change  in  bond  length  from  the  ground  to  the  excited 
state.  The  intensity  Ij  of  the  totally  symmetric  Raman  band  depends  of  the 
displacement  of  the  normal  mode  and  is  related  to  =  k  Wj2  ( Aej  )2  ,cOj 
vibrational  frequency  of  the  ith  normal  mode,  k  is  a  constant ,  Aei  displacement  of 
the  ith  normal  mode  in  the  excited  state.  It  is  then  possible  to  test  the  molecular 
force  field  by  calculating  in  the  same  way  the  3n-6  normal  modes  frequencies  and 
the  3n-6  corresponding  intensities.  Setting  the  intensity  of  the  most  intense  band  to 
10,  it  is  possible  to  calculate  the  relative  intensities  of  the  others 
bands.(refs.l5,16). 

Figure  1  displays  experimental  and  calculated  (vertical  lines)  Resonance  Raman 
Intensities  of  5'CMP  at  266  and  213  nm.  Table  10  gives  the  corresponding  in¬ 
plane  general  force  field. 

TABLE  10 


General  Valence  Force  Field  for  in  plane  vibrations  of  1-Methyl  Cytosine. 

v 

i3  n 

kl  6 

Force  constants  are  in  mdyn/  A  for  stretchings  and  in  mdyn.A  for  bendings 
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v  stretching,  8  ring  in  plane  bending.,  vsk  ring  skeletal  stretching 


N1-C2 

8.86 

(8,2,3) 

1.023 

8(CNH2) 

0.286 

C2-N3 

6.30 

(2,3,4) 

1.189 

8(NH2) 

0.425 

N3=C4 

5.34 

(3,4,5) 

0.62 

vC=0,  vsk 

1.637 

C4-C5 

5.042 

(3,4,9) 

1.412 

vC=0,  5(C=0) 

1.363 

C5=C6 

9.56 

(9,4,5) 

1.412 

vsk,  8(C=0) 

0.304 

C6-N1 

7.34 

(4,5,6) 

0.62 

v  ,  v  meta 

-0.204 

Nl-H 

5.604 

(4,5,10) 

0.37 

v  ,  v  gem 

0.851 

C2=08 

10.55 

(6,5,10) 

0.37 

v  ,  v  para 

0.88 

C4-N9 

6.57 

(5,6,1) 

0.62 

8  ,  vsk 

0.60 

C-H 

5.28 

(5,6,11) 

0.412 

vsk ,  8(N-R) 

0.193 

N9-R 

6.00 

(1,6,11) 

0.412 

vsk ,  8(C-H) 

0.445 

(6,1,2) 

1.844 

v(N-C),S(C-H) 

0.058 

(6,1,7) 

0.482 

vsk ,  8(CNH2) 

0.45 

(2,1,7) 

0.482 

8(C-H),  S(C-H) 

-0.036 

(1,2,3) 

1,62 

S(C-H),  8(N-H) 

0.1 

(1,2,8) 

1,534 

8(C=0),  8(N-H) 

0.0 

Using  the  calculated  force  field  from  intensities  of  the  UVRR  for  the  guanosine 
and  cytosine  moieties,  and  using  force  constants  derived  for  ribose  from  Baret  et 
al.  (ref.  17)  work's  onto  poly  (dA)  poly  (dT)  and  poly  d(AT)  poly  d(AT)  in  the  B 
form  ,  we  have  in  a  first  approximation  calculated  the  normal  modes  for  the  Z 
form  of  poly  (dGdC)  poly(dGdC)  as  was  first  observed  in  X  rays  by  Wang  et  al. 
(ref.  18)  and  observed  in  Raman  spectroscopy  by  Thamann  et  al  (ref.  19)  in  the 
solid  state.  The  characteristic  band  of  Z  form  of  DNA,  observed  at  617  cm-1  was 
calculated  at  625  cm-1  as  arising  from  a  normal  mode  coupling  between  the 
phosphate  ribose  moiety  and  the  glycosidic-guanosine  linkage(ref.20). 

A  second  example  of  application  is  the  transfert  of  force  constants  to  the 
calculation  of  normal  modes  of  the  crystal  of  the  base  pair  model  1  methyl 
cytosine-9  ethyl  guanine  (ref.21).  The  purpose  of  this  work  was  to  study  the 
hydrogen  bond  effects  on  normal  modes.  Particulary  interesting  is  the  potential 
energy  distribution  of  the  calculated  780  cm-1  normal  mode  of  the  cytosine  group 
mainly  CH  group  out  of  plane  vibrations  and  the  small  participation  of  out  of 
plane  C=0  group,  which  was  observed  in  UV  Resonance  Raman  spectrometric 
experiments  at  295  nm  by  Chinsky  et  al.  (ref.22). 


Determination  of  parameters  for  carbohydrates  and  oligosaccharides. 

First  a  normal  coordinate  analysis  is  performed  for  both  a-  and  [)-D-glucose 
molecules  in  the  crystalline  state  The  intra-  and  inter-molecular  potential  energy 
function  used  is  that  described  previously.Above  200  cm-1  the  potential  energy 
distribution  of  each  vibration  determined  by  previous  calculations  for  an  isolated 
molecule  (refs.23-27)  is  refound.  Comparison  with  Hineno's  works(ref.26)  on  (3- 
D-glucose,  (calculation  of  an  isolated  molecule  with  Urey  Bradley  Shimanouchi 
Force  Field  )  shows  important  improvements. 
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The  anomeric  characteristic  groups  (CiHOH,  C6H2OH)  were  treated  separately 
from  the  ring  moiety.  In.  a  counterpart,  we  have  distinguished  the  force  constants 
for  the  ring,  the  C6H2OH  group  and  the  anomeric  carbon  group.We  have  found 
that  the  values  of  the  force  constants  decrease  considerably  for  the  atomic  groups 
involved  in  the  network  of  hydrogen-bondings;  as  shown  in  Tablell. 


TABLE  11 

this  work 

ref.26 

H(COH)  =0.256 

0.413 

F(COH)  =0.586 

H(C6OH)  =  0.340 

H(C6OH)  =  0.690 

0.750 

H(HCO)  =0.250 

0.279 

F(HCO)  =0.710 

H(HCiO)  =0.250 
'F(HCiO)  =0.630 

H(HC60)  =0.190 

F(HC60)  =0.540 

0.850 

units  are  in  mdyne/A 


For  the  others  force  constants  associated  with  the  bond  angle  deformations,  no 
major  changements  were  observed.The  force  constants  associates  with  the 
torsional  coordinates  are  slightly  different  because  of  the  treatment  of  hydrogen- 
bondings  ( see  Table  12). 


TABLE  12 

this  work 

ref.26 

YCCring  =0.150 

0.090 

YCCside  =  0.125 

0.090 

YCOring  =0.0615 

0.100 

YCOside  =  0.065 

0.350 

units  in  mdyne.A 


It  is  important  to  point  out  that  approximately  the  same  set  of  parameters  for 
the  a-D-glucose  has  been  applied  (differences  existing  in  the  geometry 
modifications  )  in  order  to  explain  characteristic  frequency  shiftings  between  the 
two  anomeric  conformers.  In  the  spectral  range  below  200  cm-1,  the  usefulness 
of  electrostatic  contributions,  charge  distribution  computed  by  AMI  quantum 
mechanical  procedure  ( ref.28)  with  a  dielectric  constant  of  3,  and  the  reliability 
of  the  treatment  of  hydrogen-bondings  by  an  explicit  function  is  clearly 
demonstrated  by  the  perfect  agreement  between  computed  and  observed 
vibrations  for  both  anomers.Then  this  set  of  parameters  has  been  used  for  others 
monosaccharides,  i.e.  both  anomers  of  galactose,  methyl-a-D-glucose  and  N- 
acetyl-glucosamine.  The  assignment  of  all  the  observed  bands  is  possible,  and  it 
seems  that  this  set  of  parameters  is  reliable  and  transferable.  In  a  near  future, 
molecules  as  methyl-j3-D-glucose,  methyl-a-D-galactose,  methyl-P-D-galactose, 
methyl-a-D-mannose,  a-L-fucose  will  be  studied  in  order  to  explain  the  exo- 
anomeric  effect  and  tc  assign  the  frequencies  of  characteristic  groups  ( in  addition 
of  those  obtained  for  both  glucose  molecules  and  deuterated  analogs). 
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To  complete  the  set  of  parameters  the  study  of  different  disaccharides,  thus 
different  types  of  glycosidic  linkages,  will  be  investigated.  The  disaccharides  as 
maltose  and  cellobiose  will  give  parameters  for  the  kind  of  linkage  (  a  or  P  ). 
whereas  the  disaccharides  as  trehalose,  sophorose,  laminarabiose  and  gentiobiose 
will  provide  parameters  for  each  glycosidic  linkage  position.The  total  set  of 
parameters  will  be  used  to  perform  dynamics  simulation  of  polysaccharides. 


CONCLUSION 

The  vibrational  spectroscopy  (  frequencies  and  intensities  of  vibrational  bands) 
is  a  very  powerful  technique  to  derive  force  fields  parameters.  Precise  internal 
rotation  barriers  values  are  also  obtained.  Modifications  of  the  commonly  used 
potential  energy  function  have  to  be  taken  into  account  according  to  normal 
modes  calculation  of  model  compounds  with  biological  interest  in  the  crystalline 
state.  The  obtained  parameters,  together  with  the  function  ,  may  be  introduced  in 
the  various  macromolecular  mechanics  programs  dedicated  to  biomolecular 
structure  and  dynamics  determination.  In  the  near  future  ,  much  attention  has  to 
be  focused  on  molecules  with  therapeutical  interest  (  rings  ,fl  electrons  systems  ) 
in  order  to  develop  accurate  molecular  force  fields  on  the  basis  of  local  symmetry 
(  group)  coordinates.  In  this  case  the  determination  of  parameters  has  to  be  done 
for  each  structural  class  of  compound  using  both  the  strategy  defined  for  amino- 
acids  and  nucleic  acid  bases  and  quantum  mechanical  results. 
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DISCUSSION 


SQUMPAS1S  -  What  are  the  differences  between  your  and  the  Karplus  group 
calculations  of  the  normal  modes  of  BPTI  ? 

VERGOTEN  -  The  lowest  frequency  is  higher  than  that  calculated  by  Brooks  et  al 
(PNAS,  USA  80  (1983),  6575)  and  Cusack  et  al  (J.  Mol.  Biol.,  202  (1988)  903).  The 
atom  type  averaged  rms  fluctuations  of  the  Lys.15  residue  are  smaller  (0.43  A  for  the 
main  chain  and  0.5  A  for  the  side  chain  instead  of  0.7  and  1 .5  respectively  in  Brook's 
paper  which  seem  to  be  in  better  agreement  with  the  strong  stability  of  the 
trypsin-inhibitor  complex. 


STONE  -  Your  calculations  use  a  harmonic  force  field,  so  one  should  not  expect  close 
agreement  with  experiment,  where  anharmonic  effects  contribute.  Can  you  comment 
on  the  effects  of  anharmonicity  ? 

VERGOTEN  -  In  general  the  harmonic  assumption  is  satisfactory  enough  in  explaining 
most  internal  vibrations  and  reflecting  the  role  of  intermolecular  hydrogen  bonding  on 
local  vibrations,  in  some  cases  however  observed  frequencies  cannot  be  reproduced 
by  a  "harmonic"  calculation.  In  large  scale  motions,  the  cubic  term  in  the  Taylor 
expansion  of  the  potential  energy  has  to  be  included.  This  is  however  done  in  the 
treatment  of  small  molecules. 


DYMEK  -  Which  aspects  of  your  approach  do  you  consider  to  be  contributing  most  to 
the  fact  that  you  do  not  obtain  negative  eigenvalues  in  the  vibrational  analysis  ? 

VERGOTEN  -  The  origin  of  negative  eigenvalues  in  normal  modes  analysis  may  be 
double  :  either  the  starting  geometry  is  very  bad  or  the  potential  energy  (and  especially 
torsional  terms  and  hydrogen  bond  parameters)  is  physically  meaningless.  For 
systems  for  which  X  ray  data  are  available  the  second  argument  is  the  one  to  be  taken 
into  consideration.  Usually  when  negative  eigenvalues  are  found,  people  continue  to 
minimize  their  structure  until  the  problem  disappears.  It  seems  to  me  (and  that  is  what 
we  are  doing),  that  in  this  case  it  is  more  reasonable  to  try  to  fit  the  X  ray  structure  by 
modifying  the  potential  energy  terms. 
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SMITH  -  With  reference  to  the  previous  comment  on  negative  eigenvalues  obtained  by 
the  Karplus  group  at  Harvard  -  These  were  seen  for  the  first  analysis,  of  the  bovine 
pancreatic  trypsin  inhibitor  (Brooks,  B.  and  Karplus,  M.  P.N.A.S.  (U.S.A)  £Q.  6571 
(1983)).  Due  to  increased  computer  power,  this  is  no  longer  a  problem  and  protein 
calculations  of  this  type  reach  energy  minima,  without  negative  eigenvalues.  Some  of 
these  analyses  produce  neutron  spectra  in  good  agreement  with  the  experimental 
results,  but  the  calculated  low  frequency  density  of  states  is  very  sensitive  to  long 
range  electrostatic  effects  (Cusack  et  al,  J.  Mol.  Biol.  202.  903  (1988),  Smith  et  al, 
Physica  B  156  &  157.  437  (1989)).  Finally,  one  must  remember  that  vibrational  (as 
opposed  to  molecular  mechanics)  force  fields  are  specialised  and  cannot  give  direct 
structural  information,  as  the  geometry  used  is  assumed  to  be  at  an  energy  minimum. 
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SUMMARY 

Wo  review  the  main  ideas  underlying  numerical  optimization,  adopting  the  point  of 
view  of  actually  using  an  algorithm  on  a  computer.  Wo  briefly  describe  some  of  the  most 
useful  methods,  with  emphasis  on  the  unconstrained  problems.  Finally,  we  use  an  applica¬ 
tion  in  molecular  biology  for  illustration. 

I.  GENERAL  IDEAS 

To  solve  an  optimization  problem  is  to  find  a  set  of  variables,  satisfying  some  given 
constraints,  and  minimizing  a  given  cost -function.  We  consider  only  the  so-called  mathe- 
matical  programming  problem,  in  which  the  variables  vary  in  Rn,  and  there  are  finitely  many 
constraints. 

The  solution  (if  there  is  one)  is  actually  approximated,  using  an  algorithm  which  con¬ 
structs  an  iterative  sequence  of  trial  solutions.  The  user  must  provide  a  program  which, 
for  any  given  value  of  the  variables,  computes  the  corresponding  values  of  the  cost  and/or 
constraints  (and  also  their  partial  derivatives ;  we  will  return  to  this  later).  From  this  point 
of  view,  optimization  is  fairly  similar  to,  say,  nonlinear  systems,  or  differential  equations. 

We  explain  tho  main  ideas  that  are  used  to  define  optimization  algorithms.  They  are 
best  viewed  in  the  unconstrained  case,  i.e.  when  the  variables  are  allowed  to  take  any  value 
in  Rn.  In  this  framework,  the  simplest  problem  is  when  tho  cost-function  is  quadratic,  say 

f(x)  =  2  xT  Ax  +  bT  x  +  c . 

Here,  A  is  the  matrix  of  second  derivatives  of  f,  which  actually  does  not  depend  on  x.  Tho 
problem  then  reduces  to  a  linear  system  of  equations :  Ax  +  b  =  0. 

When  f  is  not  quadratic,  its  matrix  of  second  derivatives  varies  with  x.  To  bo  efficient, 
an  optimization  algorithm  must  somehow  estimate  this  matrix,  thus  obtaining  a  quadratic 
function  supposed  to  estimate  f.  This  is  tho  basis  for  Newton-like  methods,  conjugate 
gradient,  etc...  We  demonstrate  this  approach  and  we  explain  how  it  can  be  adapted  in  var¬ 
ious  situations,  such  as  when  tho  problem  is  large-scale,  or  stiff,  or  when  it  originates  from 
a  least-squares  problem,  etc... 

Finally,  we  show  how  the  above  ideas  can  be  extended  to  constrained  situations :  first 
in  the  case  of  linear  constraints  (tho  feasible  domain  is  a  polyhedron),  and  also  in  the  general 
case  when  the  constraints  are  nonlinear.  In  tho  former  case,  the  sequence  of  iterates  is 
usually  feasible,  while  it  can  hardly  be  so  in  tho  latter :  it  is  only  asymptotically  that  nonlinear 
constraints  are  eventually  satisfied. 
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In  this  notation,  x  represents  the  control  variables:  f  is  the  cost- function ;  the  g's 
are  the  constraints  —  inequality  constraints  (the  first  m)  and  equality  constraints  (the  last 
p)-  restricting  the  set  af  feasible  variables.  Rn  x  satisfying  all  the  constraints  will  be  called 
feasible.  Note  that  f  is  scalar-valued :  there  is  only  one  objective  function  (this  is  very  im¬ 
portant;  we  do  not  consider  multi-objective  optimization).  In  other  words,  we  want  a  value 
of  xeRn  satisfying  all  the  constraints  and  making  the  objective  function  as  small  as  possi¬ 
ble. 


. .? 


II.  THE  PROBLEM  TO  SOLVE 

We  will  denote  the  problem  of  interest  in  this  paper  by 

* 

min  f(x) 

x  e  FT 

t 

gi(x)  <  0 

i  =  1, ...  ,m 

(0 

gj(x)  =  0 

j  =  m+1, ... ,  m+p 

Remark. 

If,  instead  of  Rn,  x  were  varying  in  a  finite  or  discrete  set,  we  would  obtain  a  so-called 
problem  of  integer  -  programming .  or  of  combinatorial  optimization.  If  Rn  were  actually  an 
infinite-dimensional  space,  i.e.  if  x  were  a  function,  say  of  the  time  (instead  of  a  familiar 
vector  with  finitely  many  coordinates),  we  would  have  an  optimal  control  problem.  None  of 
these  two  cases  will  be  studied  here.  ■ 

Depending  on  the  type  of  constraints,  several  classes  of  problems  must  bo  consid¬ 
ered.  One  has 

1.  Unconstrained  problems,  i.e.  m=p=0.  These  problems  are  prototypical  and  serve  as 
models  to  the  others,  with  various  typos  of  constraints,  fls  seen  above,  the  case  of  a 
quadratic  f  is  trivial :  one  then  has  to  solve  a  linear  system  expressing  that  V  f(x)=0.  When  f 
is  not  that  special,  a  solution  can  only  bo  approximated,  by  an  iterative  algorithm. 

2.  Linearly  constrained  problems  (the  constraints  are  affine  functions,  say  gj(x)=aiTx+b;, 
where  a;eRn  and  b;  is  a  number). 

2.1.  Problems  with  equality  constraints  only  (m=0,  p>0)  which  can  actually  be  recast  into 
class  1  (for  example,  uso  the  constraints  to  extract  n-p  independent,  and  unconstrained, 
variables). 

2.2.  Problems  with  inequality  constraints  (m>0,  p  arbitrary) 

2.2.1.  Linear  programming,  where  f  is  affine  (as  well  as  the  constraints);  an 
inturmediate  case  with  combinatorial  optimizaton,  which  requires  a  special 
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method :  the  simplex  method  or,  since  more  recently,  interior  (Karmarkar) 
methods. 

2.2.2.  Linear-quadratic  problems,  inhere  f  is  quadratic  (and  the  constraints  affine) ; 
hardly  more  difficult  than  2.2.1,  they  can  also  be  solved  exactly. 

2.2.3.  fill  the  others,  uihich  can  only  be  solved  iteratively. 

3.  General  problems,  where  one  distinguishes  again 

3.1.  Problems  with  equalities  only,  of  the  same  type  as  class  1  in  that  they  have  a 
comparable  complexity. 

3.2.  Problems  with  inequalities,  or  general  nonlinear  programming. 

It  should  be  noted  that,  as  soon  as  inequality  constraints  are  added  to  a  given  problem, 
its  theoretical  complexity  is  drastically  increased  (there  are  potentially  2m  equality -con¬ 
strained  problems  to  solve :  one  does  not  know  “a  priori"  which  inequality  will  be  active  at  the 
optimum).  In  practice,  however,  the  situation  is  not  that  bad :  it  is  usually  not  necessary  to 
visit  all  these  combinatins. 

In  the  above  classification,  several  subcases  must  be  distinguished  depending  on  the 
form  of  the  (nonlinear)  functions  involved. 

a.  Sum  of  squares:  f(x)  =  22^v(x)2>  i.Q.  one  wants  to  solve  a  system  of  equations 
F  (x)=0  by  the  least-squares  method.  The  vector  F  need  not  bo  in  Rn  but  in  some  other 
space,  usually  larger  (there  are  "too  many  equations").  Hero  again,  a  trivial  case  is  when 
F  is  affine  (linear  regression). 

b.  Nondifferentiable  problems  are  those  in  which  one  of  the  functions,  say  f,  has 
discontinuous  first  derivatives.  Typical  examples  are  when  one  chooses  in  a.  above,  in¬ 
stead  of  the  ?2_norm: 

*  either  the  Vnorm,  in  which  f(x)  =  £  iFv,(x)j  (derivatives  fail  to  exist  when  some  F^=o) 

*  or  the  -norm:  f(x)  =  max  iF^x))  (derivatives  fail  to  exist  when  several  F0  are 
maximal). 

c.  Large-scale  problems:  when n  is  large,  essentially  when  it  is  impossible,  or  inconvenient, 
to  compute  or  even  store  the  n2  second  derivatives  of  f ;  likewise,  when  it  is  impossible, 
or  inconvenient,  to  store  the  Jacobian  matrices  of  the  constraints. 

Finally,  another  subclassification  is  based  on  the  available  information  concerning  f  and 
g.  This  is  the  subject  of  the  next  section. 
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III.  GENERAL  STRUCTURE  OF  RN  OPT1MIZRTION  PROGRAM 

When  actually  implemented  on  a  computer,  an  optimization  program  is  made  of  two  very 
distinct  parts.  One  is  the  algorithm  properly  said,  which  can  be  considered  as  the  decision¬ 
maker  :  it  builds  the  sequence  (xjJ  of  iterates,  supposed  to  converge  to  an  optimum.  The 
second  part  defines  the  functions  characterizing  the  problem :  f  and  g ;  it  provides  the 
decision-maker  with  the  necessary  informations  allowing  an  intelligent  construction  of  (xkh 
We  will  call  the  simulator  this  second  part. 

Rn  essentia!  feature  of  optimization  problems  is  that  the  simulator  gives  purely  point- 
wise  information:  the  algorithm  must  content  itself  with  numerical  values  of  f,  g,  etc...  at 
numerical  values  of  x,  and  nothing  more.  In  particular,  no  global  information  is  available  other 
than  theoretical  (continuity,  differentiability...):  the  simulator  should  be  thought  of  as  a 
bunch  of  punched  cards,  say,  involving  a  huge  amount  of  coding,  impossible  to  analyze  (at 
least  in  the  present  state  of  computer  science).  This  is  common  to  other  nonlinear  problems 
such  as,  say,  differential  equations.  R  second  feature  (which  is  not  share  by  differential 
equations)  is  that  the  algorithm  has  to  proceed  by  trials  and  errors :  one  is  never  sure  in 
advance  that  a  given  x  will  be  a  good  iterate. 

fl  naive  example  will  illustrate  these  two  points:  imagine  a  tennisman  practicing  his 
serve.  He  wants  to  optimize  something  like  the  final  speed  of  the  ball  (when  it  reaches  his 
adversary's  area)  and  he  can  control  initial  conditions  (velocity,  direction,  spin...).  When 
these  initial  conditions  are  fixed,  the  ball  follows  a  certain  trajectory,  to  reach  the  final 
state  one  is  interested  in.  Upon  observation  of  this  state,  the  player  corrects  his  initial 
conditions,  observes  the  resulting  effect,  and  so  on.  To  implement  this  system  on  a  com¬ 
puter,  one  needs  above  all  the  differential  equations  that  simulate  the  trajectory;  then  a 
tennis  ground  is  no  longer  needed,  provided  that  one  has  on  hand  an  intelligent  program  to 
iterate  over  the  initial  conditions.  One  feels  that  important  informations  for  this  program  will 
be,  among  others,  the  sensitivity  of  the  final  state  with  respect  to  the  initial  conditions  (this 
sensitivity  is  more  or  less  felt,  at  least  qualitatively,  by  an  experienced  tennisman). 

It  is  essential  that  the  above  decomposition  "algorithm+simulator"  bo  reflected  in  the 
general  organization  of  the  computer-program  realizing  an  optimization:  the  algorithm  is 
generally  written  by  an  applied  mathematician  (it  is  a  full-time  profession...);  the  simulator 
by  a  user  from  some  other  discipline:  physics,  chemistry,  economics...  and  the  two 
programmers  may  well  never  moot. 

fls  suggested  in  the  tennis-example  above,  the  simulator  can  be  more  or  less 
sophisticated  and  there  are  several  cases : 

(i)  Rt  each  x,  the  simulator  merely  computes  f(x)  and  the  constraint-vector  g(x)  (it  is  the 
least  that  can  bo  asked  from  it !). 

(ii) lt  computes  the  partial  derivatives  as  well,  to  obtain  the  gradient  Vf(x)  and  the  Jacobian 
Vg(x). 

(iii)  In  addition,  it  computes  also  the  second  derivatives,  let  V2f(x)  (V2g  is  rarely  useful). 
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Remark. 

Note  that  in  the  case  of  linear  functions,  say  g,(x)=aiTx+t>i,  no  simulator  is  needed  in 
principle,  since  the  mere  knowledge  of  aj  and  bj  (a  finite  amount  of  data,  instead  of  a 
subroutine)  is  sufficient  to  compute  sHthe  possible  values  of  the  corresponding  function  gj. 
The  case  of  quadratic  functions  is  similar.  Nevertheless,  a  simulator  may  sometimes  be 
useful  in  this  situation.  For  example,  it  may  happen  that  gradient  vectors  such  as  q  and/or 
Hessian  matrices  cannot  be  fully  computed,  or  even  stored;  while  a  subroutine  computing 
ajTx+bj  might  be  easy  to  write  and  execute.  ■ 

The  most  common  case  is  (ii).  We  know  of  no  situation  in  which  the  simulator  returns 
more  information  than  in  (iii);  yet,  this  latter  case  (iii)  is  rarely  rewarning:  much  work  is 
required  from  the  simulator,  which  usually  results  only  in  a  slight  improvement  of  possible 
algorithms.  Rs  for  (i)  (no  derivative),  it  should  be  avoided  by  all  means :  it  only  allows  algo¬ 
rithms  that  are  intolerably  slow,  if  at  all  convergent  —  the  most  popular  of  which  is  the 
method  of  Nelder  &  Mead. 

Remark. 

Thus,  a  dialogue  in  which  first  derivatives  are  provided  can  be  considered  as  the 
standard  situation,  realizing  an  appropriate  balance  between  efficiency  and  complexity.  It 
should  be  noted,  however,  that  the  requirement  for  the  user  to  compute  these  derivatives 
is  a  serious  drawback  of  numerical  optimization.  Although  differentiation  is  always  possible  in 
theory,  writing  the  corresponding  code  can  be  highly  complex  in  practice.  Examples  are  not 
rare  where  it  requires  several  men. years.  ■ 

IV.  GENERAL  STRUCTURE  OF  AN  OPTIMIZATION  ALGORITHM 

In  view  of  the  limited  information  available  from  the  simulator,  an  optimum  can  be  de¬ 
tected  via  a  trial  and  error  scheme  only  (remember  the  tennisman :  he  is  never  sure  that  his 
next  serve  is  going  to  be  better  than  the  present  one).  Thus,  one  has  to  use  an  algorithm, 
to  generato  a  sequence  of  trial  iterates,  say  (x^)  (the  serves  of  the  tennisman),  hopefully 
converging  to  an  optimal  solution.  Such  an  optimization  algorithm  is  usually  based  on  the  fol¬ 
lowing  idoas : 


-  A  test-function  <p(x)  is  defined,  which  must  bo  minima)  at  a  solution  of  (l).  Foruncon- 

j  strained  problems,  one  takes  of  course  ip  =  f.  For  linear  constraints,  it  is  usual  to  orga¬ 

nize  the  calculations  so  that  the  whole  sequence  {xj<)  is  feasible  (possibly  after  a  Phase  1 

f 

j  aimed  at  finding  an  initial  feasible  iterate);  then,  one  can  take  also  ip=f.  Nontinearly  con- 

p  strained  problems  are  more  delicate,  since  it  is  impossible  in  general  to  find  a  feasible  x  in  a 

o 

finite  amount  of  calculations.  In  those  cases,  (p  must  be  a  compromise  between  f- values 
and  violation  of  the  constraints.  Generally,  one  takes  an  exact  oenaitu  function 
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ipr(x)  =  r  f(x)  +  2i=i  max[o,g,(x)]  +  Zj=m+,|gj(x)|  (2) 

uihere  r  is  a  suitably  small  coefficient. 

—  Fit  a  given  iterate  x^,  a  direction  d|<eRn  is  computed.  It  is  a  descent  direction,  i.e.  such 
that  ip  decreases  locally  along  d^:  ip(x|<+td|<)<ip(x|<)  for  t>0  small  enough.  To  compute 
this  direction,  the  differential  information  concerning  (l)  is  useful,  if  not  compulsory. 

—  Then  a  descent  step  is  made.  It  consists  of  searching  tfc>  0.  to  obtain  Xk+  px^+t^  such 
that  ipCx^+^cipfxk).  This  search  is  by  no  means  easy,  because  of  the  poor  information 
provided  by  the  simulator.  It  is  here  that  the  trial  and  error  process  must  take  place. 
This  part  of  the  algorithm  is  sometimes  called  the  line-search. 

—  Furthermore,  the  stopping  criterion  is  performed,  to  detect  whether  the  current  xk  is 
close  enough  to  optimality.  Keeping  (2)  in  mind,  this  is  based  on  checking  the  magnitude  of 
the  gradient  of  ip. 

One  iteration  of  a  minimization  algorithm  at  a  given  iterate  Xk  thus  consists  of 
performing  two  essential  things :  first  compute  the  descent  direction  dk,  then  compute  the 
actual  stepsize  tk-  The  problem  of  the  direction  is  the  more  important  one  and  will  be  con¬ 
sidered  in  the  next  section.  Here  and  now,  we  give  a  few  indications  concerning  the  line- 
search. 

Given  the  starting  point  x  (=Xk)  and  the  descent  direction  d  (=dk),  the  simulator  is 
called  at  a  trial  stepsize  x+td.  Depending  on  its  answer,  this  t  is  accepted  (and  the  next  it¬ 
erate  is  xk+i  =  x+td),  or  corrected  until  it  becomes  convenient.  Thus,  a  line-search  is  a 
trial-and-error  process,  defined  by  two  things:  what  is  a  convenient  stepsize,  and  how  an 
inconvenient  stepsize  is  corrected. 

The  stepsize  is  corrected  on  the  basis  of  a  safeguarded  polynomial  fit :  in  case  (ii),  one 
has  on  hand  the  value  of  ip  and  of  Vip  at  each  trial  stepsize,  including  t=0  (the  simulator  has 
been  called  at  the  current  iterate  Xk).  Most  commonly,  one  uses  these  informations  at  the 
current  t  and  at  the  previous  t.  to  define  a  cubic  function  of  the  single  variable  t,  whose 
local  minimum  (when  it  exists)  gives  the  next  trial  stepsize.  In  addition,  one  has  on  hand  a 
"bracket",  also  made  of  two  previous  trials,  inside  which  a  convenient  stepsize  is  known  to 
lie.  The  cubic  interpolator  is  then  forced  inside  the  bracket,  which  is  thus  contracted  at 
each  trial. 

Defining  the  set  of  convenient  stepsizes  (i.e.  the  stopping  criterion  for  the  line- 
search)  is  by  far  the  more  important  item :  it  must  be  tolerant  enough  to  allow  a  fast  stop¬ 
ping  of  the  line-search,  and  strict  enough  to  ensure  convergence  of  (x^  toward  a  minimum. 
Roughly  speaking,  the  line-search  is  stopped  if  the  current  t>0  is 

-not  too  large,  which  is  quantified  by  the  fact  that  ip(x+td)-<p(x)  is  sufficiently  negative  (if 
so  is  not  the  case,  t  becomes  the  right-end  of  the  bracket) 


69 

—  not  too  small,  which  is  quantified  by  the  fact  that  ip'(x+td)-ip’(x)  is  sufficiently  positive  (the 
notation  ip'  means  the  derivative  of  ip,  considered  as  a  function  of  the  single  variable  t; 
when  tis  not  too  large  and  not  too  small,  the  line -search  is  finished;  if  t  is  not  too  large 
but  too  small,  it  becomes  the  left-end  of  the  bracket). 

Properly  programmed  line-searches  thus  accomplish  an  iteration  in  less  than  2  or  3 
trials. 

V.  USEFUL  MINIMIZATION  ALGORITHMS 

Considering  that  all  optimization  algorithms  use  similar  line-search-procedures,  they 
differ  only  in  the  way  the  direction  is  computed.  This  direction  is  computed  on  the  basis  of 
theoretical  considerations  aimed  at  decreasing  the  function  tp  of  (2).  It  is  in  contrast  to  the 
line-search,  which  works  "a  posteriori",  on  the  firm  ground  of  actual  ip- values.  The  basic 
way  of  reasoning  for  the  direction  is  as  follows :  one  supposes  that,  close  to  X|<,  <p  has  such 
or  such  form;  then  one  asks  what  a  suitable  direction  should  be,  under  this  simplifying 
assumption.  Specifically,  the  objective  function  is  assumed  quadratic  and  the  constraints  (if 
any)  linear ;  this  places  the  problem  in  class  2.2.2.,  which  can  be  solved  exactly. 

Remark. 

We  mention  here  that,  in  case  of  nonlinearly  constrained  problems,  the  relevant 
curvature  is  not  that  of  f,  but  that  of  the  Lagrange  function :  the  curvature  of  the  con¬ 
straints  must  be  taken  into  account  as  well. 

When  making  this  linear-quadratic  assumption,  the  function  of  (2)  becomes 

>pr(x)  =  r[f(xk)  +  Vf(xk).(x-x|<)  +  i(x-Xk)A(x-Xk)]  + 
2max[0,gi(xk)+Vgi(xk).(x-Xk)]  +  Zl9j(*k)+Vgj(xk).(x-Xk)| 

in  which  A  represent  the  relevant  curvature.  Minimizing  ipr  amounts  to  solving  the  problem 
with  n+m+p  variables  and  2(m+p)  linear  constraints 

minr[Vf(xk).(x-Xk)  +  2(x-xk)A(x-xk)]  *  Zi=ivi  +  Zj  =  m+ivj 

Vj  >  0  ;  v,  >  gi(xk)  +  Vgi(xk).(x-xk)  (3) 

vj  >  9j<xk)  +  Vgj(xk).(x-xk)  ;  vj  >  -gj(xk)  -  Vgj(xk).(x-xk) 

whose  solution  can  be  considered  as  approximating  a  solution  of  (l)  (if  Xk  is  close  to  such  a 
solution,  and  if  r  is  small).  ■ 

It  is  for  efficiency  that  second  order  effects  taken  into  account  when  computing  the 
direction.  As  mentioned  earlier,  no  definite  increase  of  performances  is  usually  to  be  ex¬ 
pected  from  higher  order  considerations.  On  the  other  hand,  a  first  order  analysis  is  defi- 
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nitely  insufficient  and  gives  drastically  worse  results.  Rs  a  rule,  an  optimization  algorithm  is 
efficient  if  and  onlu  if  it  is  based  on  a  second  order  analysis. 

Naturally,  the  second  order  information  is  not  always  available  from  the  simulator, 
especially  in  the  standard  case  (ii).  Indeed,  the  whole  issue  when  computing  the  direction  is  to 
identify  the  (necessary  but  unknown)  second  derivatives.  We  review  below  some  of  the  most 
used  methods,  depending  on  the  situation.  For  the  sake  of  simplicity,  we  consider  the 
unconstrained  case  only:  the  problem  is  to  minimize  the  function  f  over  the  whole  Rn;  as 
mentioned  earlier,  there  is  no  point  in  considering  any  special  function  ip,  which  is  just  f. 

In  case  (iii),  we  can  write 
f(xk+d)  ^  f(xk)  +  Vf(xk)d  +  2  d  V2f(xk)d 

so  it  makes  sense  to  take  d  minimizing  this  quadratic  approximation,  i.e.  to  solve 

V2f(xk)d  =  -Vf(xk).  (4) 

We  recognize  the  Newton  method  for  solving  Vf(x)=0.  The  possibility  of  performing  a  line- 
search  along  the  resulting  direction  is  a  great  advantage  over  general  systems  of  equa¬ 
tions  :  stability  is  automatically  obtained  from  the  mere  property  f(xk+  ])<  f(xk).  Naturally, 
the  Newton  iterate  is  obtained  with  the  stepsize  t=!  (which  would  be  the  only  stepsize  "a 
priori"  reasonable,  if  f  were  not  known  in  addition  to  the  vector-field  Vf);  therefore  t=l 
can  be  used  to  initialize  the  line-search,  fill  this  is  valid  only  if  the  matrix  V2f  is  positive  defi¬ 
nite;  otherwise,  the  Newton  direction  might  not  be  downhill;  furthermore,  the  Newton  it¬ 
eration  would  have  a  tendency  to  converge  to  a  saddle-point  of  f,  rather  than  a  minimum. 
When  V2f  is  positive  definite,  (4)  is  best  solved  by  the  Cholesky  factorization  of  V2f.  We  add 
here  that,  in  the  case  of  a  large  problem  (when  V2f  is  impractical),  (4)  can  be  solved  by  an 
iterative  method  (relaxation,  conjugate  gradient...)  interrupted  before  convergence  (so  as 
to  get  a  reasonable  direction  in  a  reasonable  computing  time).  This  gives  truncated  Newton 
methods. 

Host  usually,  namely  in  case  (ii),  V2f  is  not  available;  the  idea  is  then  to  take  dk=- 
GkV  f(xk),  where  the  matrices  Gk  are  computed  recursively,  with  the  aim  cf  approaching  the 
inverse  of  V2f  —  just  as  Xk  is  supposed  to  approach  a  solution  of  (l).  These  are  quasi- 
Newton  methods,  uomostlu  excellent,  although  little  known  out  of  the  world  of  professional 
optimizers  (and  yet,  they  were  discovered  in  1959  by  W.C.  Davidon,  a  physicist !).  They 
generalize  to  several  dimensions  the  so-called  secant  method,  or  "regula  falsi",  which  works 
as  follows :  to  solve  the  nonlinear  equation  g(x)=0  (xeR),  the  Newton’s  method  is  Xk+ 1  =  xk 
■  9  '  '(xk)g(xk)-  In  the  secant  method,  the  unknown  g'(xk)  (playing  the  role  of  V2f(xk))  is 
approximated  by  the  differential  quotient  [g(xk)-g(xk-|)]  /  [xk-xk-|]  (playing  the  role 

of  G  k').  In  other  words,  the  unkown  but  necessary  differential  information  of  g  is  identified 
upon  observation  of  g-differences.  In  Rn,  where  g  is  Vf,  the  differences  y  :=g(x+)-g(x) 
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and  s  :  =  x+  -x  are  stored,  fit  iteration  k,  k  such  differences  can  be  used  to  form  an  ap¬ 
proximation  Gk  of  V-2f(xk).  Several  such  approximations  are  possible,  one  is  admittedly 
best,  usually  called  BFGS  (for  Broyden,  Fletcher,  Goldfarb,  Shanno).  Rt  the  first  iteration, 
one  starts  uiith  a  starting  approximation  of  V‘2f.  Then,  having  Gk  and  the  current  pair 
(sk»yk)  of  differences,  Gk+ 1  is  explicitly  computed  by  the  formula  (we  drop  the  index  k  to 
alleviate  notations) 

G+  =  G  -  (syTG+GysT)  /  sTy  +  (l  +  yTGy/sTy)  ssT  /  sTy  .  (5) 

Note  that  the  whole  information  making  up  Gk  is  contained  in  k  pairs  of  n -vectors 
(instead  of  n2  numbers);  this  makes  it  possible  not  to  store  the  matrices  Gk  themselves,  but 
rather  the  pairs  (s,y) ;  economic  formulae  allow  the  computation  of  the  direction  without  an 
explicit  use  of  the  recurrence  formulae  (5).  This  is  especially  handy  for  large  problems, 
opening  the  way  to  limited-memoru  methods,  where  the  maximal  number  of  pairs  (s,y)  to  be 
stored  is  imposed  by  the  user,  depending  on  n  and  the  computer.  The  most  economic  such 
method  is  the  conjugate  gradient,  in  which  the  recurrence  formulae  reduce  to 

dk+i  =  -Vf(xk+|)  +  o<v.dk . 

When  f  is  a  sum  of  squares,  the  technique  of  Gauss-Newton  can  be  used :  Gk  is  the  in¬ 
verse  of  the  matrix  2^f:v(xk)^TFv(xk^  ^  differs  from  the  true  V'2f  by  the  term 
^F^V^,  which  is  small  when  the  V2F,>  are  small  (regression  moderately  nonlinear)  or 
when  the  F^  are  small  (equations  almost  satisfied).  We  thus  have  one  more  way  of  approxi¬ 
mating  second  order  with  the  sole  help  of  a  first  order  simulator.  In  "bad"  cases,  the  above 
Gk  is  only  positive  semi-definite ;  then,  a  term  XI  is  added,  so  as  to  make  it  positive  definite : 
this  is  the  technique  of  Levenbera-Marouardt. 

For  the  sake  of  completeness,  we  mention  that  the  concept  of  second  order  is  irrele¬ 
vant  for  nondifferentiable  problems  (what  is  the  second  derivative  of  a  function  which  has 
no  first  derivative?).  In  this  case,  the  mere  requirement  f(xk+  |)<  f(xk)  is  already  difficult 
enough  to  achieve.  There  exist  subgradient  methods,  in  which  this  descent  property  is  just 
given  up,  and  bundle  methods,  highly  sophisticated,  in  which  computing  the  direction  is  a 
nontrivial  problem  "per  se"  (while  explicit  formulae  give  this  direction  in  all  cases  above). 

For  linearly  constrained  problems,  there  are  feasible  directions  methods  with  an  ac¬ 
tive  set  strategy :  one  proceeds  essentially  as  before,  except  that  the  direction  is 
"twisted"  so  that  xk+tdk  satisfy  the  constraints  for  t>0  small  enough.  In  these  methods, 
each  xk  is  feasible  and  >p  can  be  chosen  equal  to  f.  Specifically,  one  chooses  a  set  1C  (l,...,m) 
of  constraints  that  one  wants  to  keep  active  from  xk.  Having  the  matrix  Gk  (whatever  it  is), 
dk  solves 


minVf(xk)  +  l^^kd 
Vgi(xk)  d  =  0  i  e  I 


and  Vgm+j(xk)d  =0  j=  l,...,p 
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(note  that  this  amounts  to  solving  a  linear  system  with  n+p+|I)  equations).  Then,  a  line- 
search  is  made  according  to  the  same  principles  os  before,  with  the  extra  requirement  that 
xfc+tdk  must  be  kept  feasible;  in  practice,  this  imposes  a  maximal  value  to  t^.  fl 
characteristic  of  these  methods  is  that  the  set  I  of  "activated"  constraints  can  change  by 
at  most  one  index  at  each  iteration.  This  can  be  a  very  inconvenient  limitation  if  the  number 
of  linear  constraints  is  large;  then,  it  is  often  rewarning  to  pretend  they  are  nonlinear  and 
to  place  oneself  in  the  next  case. 

Currently  best  methods  for  nonlinear  constraints  are  based  on  the  following  princi¬ 
ples:  at  each  iteration,  (l)  is  replaced  by  a  "tangent  linear-quadratic  problem"  at  x^.  Indeed, 

dk  is  the  value  x-xj<  obtained  after  solving  (3).  The  iterates  are  not  necessarily  feasible, 
and  the  line-search  is  performed  with  the  help  of  the  function  ipr.  The  theory  of  Lagrange 
multipliers  plays  an  important  role;  in  particular,  it  is  used  to  construct  the  matrix  R  in  (3) 
and  to  select  proper  values  of  r.  These  methods  are  rother  sophisticated;  among  others, 
solving  (3)  is  by  no  means  trivial,  due  to  the  presence  of  inequality  constraints,  even  though 
linear. 

V.  EXAMPLE:  IDENTIFYING  R  MOLECULAR  STRUCTURE 

fln  important  problem  in  biochemistry  is  to  figure  out  the  location  in  the  space  of  the 
atoms  making  up  a  given  molecule.  To  do  this,  there  are  two  possible  approaches : 

—  Experimental  methods.  The  actual  molecule  is  observed  by  some  physical  means  (X-ray 
diffraction,  nuclear  magnetic  resonance,...)  giving  some  information  on  the  Fourier 
transform  of  its  electron  distribution. 

—  Theoretical  methods.  To  each  possible  configuration  is  attached  a  potential  energy,  which 
has  to  be  minimal  at  the  actual  configuration. 

They  both  give  birth  to  an  optimization  problem,  and  we  will  consider  the  second 
aproach,  a  lot  easier  to  explain.  Let  the  atoms  be  numbered  from  1  to  N  and  characterize  by 
XjGR3  the  position  of  the  atom  i.  Note :  X|  can  be  thought  of  as  the  3  Cartesian  coordinates 
of  the  atom  i,  but  it  may  be  sounder  to  choose  other  parametrizations.  R  first  important 
thing  to  understand  is  that  our  problem  is  solved  if  we  can  assign  "the"  correct  values  to  X|, 
X2, ...,  Xf-j.  In  words,  we  have  a  problem  with  n=3N  unknowns.  The  second  important  thing 
is  that,  when  these  unknowns  are  assigned  an  arbitrary  numerical  value,  a  "performance 
index"  (the  conformational  energy)  can  be  computed,  to  tell  us  how  good  are  these  values. 
This  energy  is  the  sum  of  a  number  of  terms,  for  example 

—  Bond  length :  if  d  is  the  distance  between  two  given  atoms,  there  is  an  energy  related  to  it, 
say 

D(d)  :=  5  6  (d-d)2 

where  8  and  d  are  two  constants  characterizing  the  considered  pair  of  atoms. 
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—  Valence  angle :  if  e  is  the  angle  formed  by  3  atoms,  another  energy  is  (p  and  8  are  con¬ 
stants  as  before) 

B(e)  :=  |p  (8-8)2. 

—  Van  der  Waals :  again  for  a  pair  of  atoms,  another  energy  is 
V(d)  :=  v,(d/d)6  -  v2(d/d)>2. 

We  limit  our  description  to  these  three  terms,  just  for  simplicity  (needless  to  say, 
other  terms  can  be  considered  as  well  without  impairing  the  approach :  electrostatic,  tor¬ 
sion  angles...).  The  total  energy  associated  with  a  given  configuration  is  then 

Zi,jcA  D[d(Xj,Xj)]  +  Ii.j)keR  B[e(Xi,Xj,Xk)]  +  Ii,jeW  V[d(X„Xj)] 

in  which  A,  R  and  W  are  appropriate  subsets  of  pairs  and  triples  of  atoms.  Rgain,  what  is 
important  is  to  understand  that,  in  this  expression,  each  individuai  term  depends  on  the  re¬ 
spective  distance  and  angle  of  the  relevant  atoms;  these  in  turn  depend  on  the  corre¬ 
sponding  X-variables.  Thus,  we  are  entitled  to  call  f(x)  the  above  expression,  if  we  symbol¬ 
ize  by  xeRn  the  n=3N  unknowns. 

It  can  be  conceived  that  the  computation  of  f  and  its  derivatives  can  be  programmed, 
provided  that  enough  time  is  allotted  to  do  a  proper  job  -  and  debug  it !.  The  result  is  a  set 
of  Fortran  subroutines  which,  having  numerical  values  of  x  as  input,  do  all  the  necessary 
computations  (geometric,  trigonometric,...)  and  return  the  numerical  values  of  f  and  Vf, 
and  perhaps  72f.  These  computations  are  done  even  though  the  given  x  may  represent  an 
absurd  configuration:  they  just  simulate  the  proposed  molecule  (any  absurdity  will  be  re¬ 
flected  in  the  resulting  f-value  anyway;  we  pass  under  silence  the  question  whether  the 
above  expressions  are  a  reasonable  model  of  the  physical  situation;  quite  another  story).  In 
a  word,  we  are  exactly  in  the  situation  of  the  tennis-practice  alluded  tc  in  §  III. 

With  the  help  of  this  program,  a  library  code  can  be  used  to  minimize  f  (or  at  least  try 
to  !).  In  applications  of  interest,  the  number  H  of  atoms  is  in  the  103-range,  so  we  are  in  the 
case  l.c  of  §  II.  It  is  advisable  to  choose  aiimited-memory  algorithm,  using  only  f  and  Vf.  We 
mention  here  that  the  sets  A,  R,  W  depend  on  x  to  some  extent,  which  implies  some 
complication :  the  human  collaboration  cannot  be  totally  eliminated  between  the  chemist,  who 
programs  the  simulator,  and  the  mathematician,  responsible  for  the  optimization  code. 

fl  specific  software,  called  ORAL,  has  been  written  by  K.  Zimmermann,  of  Laboratoire 
de  Pharmacologie  Macromoli  culaire,  Institut  Gustave  Roussy  (CNRS).  The  energy  incorpo¬ 
rates  the  RUBER  force-field,  and  minimization  is  done  by  a  limited-memory  quasi-Newton 

O 

code  written  atINRIfl.  It  typically  reaches  RMS -gradients  smaller  than  lC_2kcal/mole/R  in  a 
number  of  simulations  substantially  less  than  the  number  of  atoms  (when  this  number  is  in 
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thel03-rarige).  When.cohfronted  to  similar  problems,  the  original  version  of  RMBER:(based 
on  the  conjugate  gradient  method)  simply  does  not  reach  such  RMS-values. 

For  an  illustration,  me  used  a  molecule  having  1184  atoms  (B-DNA  d(R)is.d(T)is) ;  it  ap¬ 
peared. to  be  fairly  difficult  to  optimize.  Rn  unbounded  cutoff  urns  used  (i.e:  the. sets  like  A, 
fl  and  W. above  mere  constantly  maximal;  otherwise  AMBER  had  difficulties  with  discontinu¬ 
ous  f).  The  evolution  of  the  energy  is. plotted  on  Figure  1,  as.  a  functionjof  .the  number  of 
simulations  for  AMBER  (dashed  line)  and.ORRL.  The  left  part  dilsplays  the  first  500  simula¬ 
tions  (the  starting  energy  mas  1747  kcal/mole),  and  the  tail  appears  on  the  right  part,  mith  a 
dilated  vertical  scale.  After  some  1400  simulations,  the  RMS  values  mere  .069  and  .014  for 
AMBER  and  ORAL  respectively.  Note  that  comparison  of  speeds  should  -be  done  on  hori¬ 
zontal  lines  rather  than  vertical:  for  example,  the  level  -700  mas  reached  by  AMBER  in 
some  800  simulations,  while  400  sufficed  for  ORAL. 
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Figure  1.-  Behaviour  of  AMBER  and  ORAL 


VII.  SHORT  BIBLIOGRAPHY 


For  an  introduction  hardly  more  developed  than  the  present  text,  see 

C.  Lemarechal:  Optimisation,  in:  Freds  d'Futomatique.  Techniques  de  1'lngenieur,  21  rue 
Cassette,  75006  Paris  (1983). 

For  more  complete  texts :  a  good  French  book  is  I 

M.  Minoux :  Programmation  tlathematique.  Dunod,  Paris  (1983)  • .  ’< 
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while'  many  books  are  available  in  English  ;'.forexample 

R.  Fletcher:  Practical  Methods  of  Optimization.  Wiley,  Chichester  (2nd  Edition,  1987). 

P.E.  Gill,  W.  Murray,  M.H:  Wright :  Practical  Optimization.  Rcademic  Press,  Neui  York  (l98l). 

These. two  books  are  absolutely  non-technical  (theorems  are  rare,  especially  in  the  sec¬ 
ond).  They  contain  mahy.practical  developments,  useful  for  a  user  not  particularly  inter¬ 
ested  in  technical  details  of  optimization. 

-The  best  reference  for  unconstrained  problems  is  probably, 

J.E.  Dennis,  R.B.  Schnabel :  Numerical  Methods  for  Unconstrained  Optimization  and  Nonlin¬ 
ear  Equations.  Prentice  Hail;  Englewood  Cliffs  (1983). 

Written  by, two  world  experts,  it  makes  a  good  balance  between  mathematical  rigour  and 
readability.  .Furthermore,  its  content  is  of  good  quality,  in  that  the  study  is  limited  to 
"good'Vi.e.  efficient  and  modern;. methods.. Let  us  recall  that  methods  for  unconstrained 
problems  form  the  basis  for  the  constrained  case. 

Excellent  references,  with  particular  attention  to  constrained  problems: 

D.G.  Luenberger:  Introduction  to  Linear  and  Nonlinear  Programming.  Rddison  Wesley, 
Reading,  Massachusetts  (2nd  Edition,  1984). 

D.P.  Bortsekas:  Constrained  Optimization  and  Lagrange  Multiplier  Methods.  Rcademic 
Press,  New  York  (1982). 

Finally,  we  mention  a  popular  but  definitely  outdated  reference : 

D.  J.  Wilde :  Optimum  Seeking  Methods.  Prentice  Hall,  Englewood  Cliffs  (1964). 
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DISCUSSION 


KOZELKA  -  1.  We  have  used  ORAL  for  energy  minimization  of  cis-platinum  adducts 

with  oligonucleotides,  as  well  as  for  calculations  of  energy  barriers  between  different 
conformations.  ORAL  is  really  very  robust,  does  not  get  stuck  in  high-energy  states, 
and  the  energy  gets  smoothly  and  steadily  down  in  the  course  of  the  minimization. 

2.  When  comparing  the-.efficiency  of  different  minimizers,  I  would  prefer 
to  have  the'erTergy  plotted  against  CPU  time  rather  than  against  number  of  cyles.  The 
CPU  time  used  for  one  cycle  can.be  quite  different  in  different  programs. 

LEMARECHAL  Your  point  2.iscorrect  in  general:  More  precisely,  quasi-Newton 
algorithms  with  limited  memory  need  a'CPU  time  TA=  0(n)  at  each  cycle,  it  increases 
with  the  number  m  of  pairs  (s,y)alluded  to  in  the  paragraph  following  (5)  in  my  paper ; 
this  m,  however,  must  be  fairly  small,  say  £  20  :  see  numerical  experiments  in  : 

J.C.  Gilbert,  C.  Lemarechal :  Some  numerical  experiments,  with  variable  storage 
quasi-Newton  algorithms.  Mathematical  Programming  45,  3  (1989,  to  appear). 

In  the  present  class  of  problems,  TA  is  smaller  than  the  time  Ts  spent  in  the  simulator, 
unless  an  unrealistically  severe  cutoff  is  used  to  compute  the  force-field.  For  the  case 
of  Figure  1„there  is  no  cutoff :  Ts=  0(n2),  hence  Ts  +  TA  =  Tg  and  the  curves  remain 
the  same  if  the  horizontal  axis  represents  the  total  CPU  time.  With  a  reasonable  cutoff, 
the  ratio  Ts/TA  would  be  about  10  :  the  number  of  cycles  would  still  be  a  sensible 
measure  of  speed  -  and  much  more  accessible  than  CPU  ! 


CHOMILIER  -  The  optimization  algorithm  is  able  to  refine  by  iteration  the  value  of  the 
minimum,  i.e.  to  give  xk+i  starting  from  xk. 

What  about  the  first  point  of  the  problem  x0.  How  can  it  be  determined  ?  Is  there  any 
algorithm  to  compute  it  ? 


LEMARECHAL  -  Unfortunately,  the  answer  is  a  definite  "no".  There  are  problems  with  a 
special  structure,  well-suited  to  special  mathematical  tools  :  convex  analysis,  duality 
theory  (this  is  not  the  case  of  the  present  problems,  I  believe)  ;  then  a  solution  can  be 
grossly  approximated,  say  by  x0.  Otherwise,  no  answer  can  be  given  to  these 
questions,  at  least  by  numerical  analysts. 
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STONE  -  Can  you -highlight  the  differences  betweemORAL  and  the  routines  available 
in  the  NAG  and  H-'well- libraries? 

And  can.  you.  comment  on  the  problem  . of  escaping  fromJocal  minima  when  the 

'"“■y 

objective  is  to  find  the  global  minimum  ? 

LEMARECHAL  -  The  code  is  probably  comparable  to  E04DGF  of  NAG.  Technically, 
the  mathematical  method  is  that  of  VA13A  (Harwell).  The  line-search  is  similar  and  the 
direction  is  computed  by  a  quasi-Newton  formula  such  as  (5).  However,  the  G-matrix  is 
hot  quite  the  same  :  in  Harwell,  it  is  computed  on  the  basis  of  all  the  pairs  (Si,y-i), 
.(S2,y2).  -V..  (  (sk,yk)  and.  is- stored  explicitly,;  so. the, program  . needs  0{n2),p!aces  of 
memory..  In  the. present  program,  the  matrix  uses  oniy,(  (sk.m,  yk.m), ....  (sk,yk),  with  m 
small- (see  myanswer  to  J.-Kozelka  above).;  thus,  oniy  0(n)  memory  is  needed,  say 
30n.  This  way  qf.computing  the  direction  is  described  in 

J.  .Nocedal :  Updating  quasi-Newton  matrices  with  limited  storage.  Mathematics  of 
Computation  35(1980)  773-782. 

The  question  of  local  vs.  global  optima  is  very  close  to  that  of  J.  Chomilier  above,  and 
the  answer  is  unfortunately  the  same  :  we  have  no  practical  tool  for.(i)  eliminating  local 
minima,  or  (ii)  checking  global  optimality.  To  realize  that. -the  existing  tools  are  fairly 
weak,,  see 

LC.W.  Dixon,  G.P.  Szego  :  Towards  global  optimization,  vols.  I  (1975)  &  II  (1978), 
North  Holland. 

On  the  other. hand,  .the  main  trouble  seems  to  be  that  the  optimization  codes  provide 
you  with  stationary  points  (where  the  gradient  is  0)  rather  than  true  local  minima 
(where  also  the  Hessian  matrix  is  (semi)  positive  definite).  Incidentally,  so  is  probably 
the  case  of  Figure  1 .  Now,  there  exist  methods  -  rather  recent  and  still  rather 
experimental  -  which  are  able  to  avoid. local  maxima  and  saddle-points,  hence  finding 
true  local  minima  (see  below  my  answer  to  J.  Devillers). 


SQUMPASIS  -  Simulated  annealing  is  a  positive  approach,  which  "globalizes"  local 
methods  via  Monte  Carlo  techniques.  To  my  knowledge,  it  has  not  been  applied  to 
molecular  reconstruction  (except  perhaps  in  crystallography)  but  good  suceess  has 
been  reported  in  combinatorics,  for  example  for  the  travelling  salesman  problem. 

LEMARECHAL  -  1  have  some  misgivings  concerning  simulated  annealing  in  the 
present  context :  the  objective  function  must  be  very  simple,  because  it  is  going  to  be 
computed  millions  of  times.  So  is  not  the  case  of  the  force-field,  unless  a  concept  of 
"neighboring”  configurations  is  defined,  to  allow  fast  computation  of  f,  given  its  value  at 
some  neigbour. 
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-;BRTTlfT -(comment)--  Truricated:Newton  and  quasi-Newton  methods  are  reminiscent 
oTthe  adopted  basis  Newton-Raphson  method  where  thelast  m  gradient  vectors  are 
used  td  define  a? small  subspace  of  most  gain:  Then  a  full  N-R  step-si-made  in  this 
space  of  dimension  5-10.  This  method  works  well- for  problems  of  dimension  up  to 
20.000  or  30.000. 


DURUP  -  For  the  dialogue  between  physical  chemists  arid  numericists  it  should  be 
pointed 'out  that  the  choice  ofthe  best  algorithm  ^  dependent  ori(i)  whether  we  are 
interested  in  the  absolute  minimum  or  in  a  local  minimum,  and'(ii)  the:  topography  of 
the  potential  surface,  the  mathematician  should  not  "ignore"  f(x).  Moreover,  a  common 
language  should  be  developed  for  describing  the  topological  features  which  are 
critical  for  the  definition  of  the  algorithm,  which  for  example  will  not  be  the  same 
depending  on  whether  there  are  1,  1000,  1010  or  zero  paths  going  (monotonously 
decreasing)  from  f(x0)  to  the  minimum;  and  what  the  "widths"  of  these  paths  is  the  9tn 
space  are. 

LEMARECHAL  -  The  mutual  "ignorance"  between  the  algorithm  and  the  simulator, 
which  I  believe  is  necessary,  concerns  only  the  Fortran  run-time.  It  excludes  by  no 
means  a  close  collaboration  between  the  algorithm-designer  and,  say,  the  physical 
chemist,  when  necessary  ;  for  example  during  the  phase  of  choosing  an  adequate 
algorithm  :  the  computer  is  not  involved,  yet. 

As  for  your  second  point,  we  numerical  analysts  are  disarmed  to  study  the  topology  of 
the  energy-surface.  I  would  rather  say  that  this  kind  of  question  is  in  the  field  of  pure 
mathematics  but,  with  104  variables,  I  am  not  very  optimistic.  The  issue  would  probably 
imply  a  proper  parametrization  of  these  variables,  so  as  to  reduce  their  number. 


DEV1LLERS  -  Concerning  molecular  mechanics,  second  derivatives  are  rather  easy  to 
calculate.  So,  provided  that  the  problem  is  not  too  large,  in  what  consists  the 
superiority  of  Davidson-Fletcher-Powell  type  of  optimization  methods  versus 
Newton-Raphson  type  ? 


LEMARECHAL  -  Newton-Raphson  (N-R))  methods  have  the  definite  advantage  of 
converging  much  faster :  instead  of  the  103  cycles  reported  on  Figure  1,  102  would 
already  reduce  the  RMS  to  much  lower  values.  The  price  is  the  need  to  solve  the  linear 
system  (4).  Also,  the  method  rushes  (when  it  works)  to  the  next  stationary  point,  which 
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most  probably  is  not  a  local  minimum. 

I  believe,  however,  that  N-R  methods  are  best  for  the  present  problem,  and  that  their 
disadvantages  can  be  turned  to  advantage  over  quasi-Newton  methods.  Indeed,  (4) 
equates  to  0  the  gradient  of  the  quadratic  approximation  -  call  it  fq  -  of  f  near  xk  ;  this  is 
unwise  if  fq.is  not- convex.  A  much  better. idea  is  to  minimize  fq  by  some  economic 
method  such  as  conjugate  gradient,  and  only  approximately.  In  particular,  such  a  local 
minimization  is  interrupted  when  the  trial  direction  is  downhill  and  has  negative 
curvature  (thus  indicating  that  fq  is  unbounded  from  below,  and  that  N-R  is  not 
appropriate).  This  approach  is  decribed  in  : 

R.S.  Dembo,  T.  Steihaug :  Truncated-Newton  algorithms  for  large  scale  unconstrained 
optimization.  Mathematical  Programming  26,2  (1983)  190-212. 

In  addition  to  being  irhplementable  fdr  large-scale  optimization,  it  avoids  saddle-points 
of  f.  With  J.  Navaza  (Fac.  Pharmacie,  Univ.  Paris-Sud),  we  have  already  obtained 
encouraging  results  for  the  phase-problem  in  crystallography  ;  it  should  certainly  be 
tried  for  energy-minimization  as  well. 

Now,  when  the  problem  is  not  too  large,  (4)  can  be  solved  explicitly  but  there  remains 
the  difficulty  of  negative  eigenvalues.  Against  it,  one  can  minimize  fq  on  a  ball  of  radius 
t  >  0,  centered  at  xk.  Even  though  fq  is  not  convex,  its  global  minimum  can  then  be 
computed.  Furthermore,  the  line-search  is  replaced  by  an  adjustment  of  t  at  each 
iteration  k.  We  obtain  a  "trust-region"  algorithm,  which  captures  all  the  advantages  of 
N-R,  and  which  converges  to  a  true  local  minimum.  See  a  review  in  : 

J.  More  :  Recent  developments  in  algorithms  and  software  for  trust  region  methods.  In 
A.  Bachem,  M.  Grotschel,  B.  Korte  (Eds.)  Mathematical  Programming,  the  State  of  the 
Art ;  Springer  Verlag  (1983)  258-287. 
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DESCRIBING  PROTEIN  CONFORMATION  :  A  NEW  MATHEMATICAL  APPROACH 


K.  ZAKRZEWSKA  and  R.LAVERY 


Laboratdire  de  Biochimie  Theorique,  Institut  de  Biologie  Physico- 
Chimique,  13,  rue  Pierre  et  Marie  Curie,  Paris  75005  (France) 


ABSTRACT 

An  algorithm,  P-Curves,  recently  developed,  in  our  laboratory 
enables  a  generalized  helicoidal  description  of  protein  structure 
and  yields  both  an  overall  axis  describing^  the  folding  of  the 
protein  backbone  and  a  full  set  ..of  helicoidal  ..parameters  locating 
each  peptide  in  space.  Two  applications  of  this  algorithm  are 
presented.  Firstly >  P-curves  is  used  to  automatically  detect 
secondary  structures  within  proteins,  the  definition  of  the 
secondary  structure  limits  being  determined  statistically  with 
respect  to  random  polypeptide  chain  conformations.  Secondly,  the 
method  is  used  to  search  and  quantify  conformational  similarities 
between  homologous  proteins. 


INTRODUCTION 

Although  a  considerable  number  of  protein  structures  have  now 
been  crystallographically  determined  to  high  resolution  there  are 
still  difficulties  in  extracting  all  the  conformational  data  that 
these  results  contain.  In  particular,  there  is  no  rigorous 
procedure  for  the  precise  location  of  secondary  structures  and  the 
description  of  their  deformations,  or  for  determining  the  exact 
pathway  followed  by  a  folded  polypeptide  backbone. 

We  have  thus  been  led  to  develope  a  rigorous  algorithm  for 
dealing  with  such  problems.  Our  starting  point  was  to  look  for  a 
way  to  extend  helical  geometry  to  irregular  systems  so  that  the 
notion  of  a  helical  axis  could  be  conserved,  although  this  axis 
would  in  general  be  a  space  curve  rather  than  a  straight  line. 
The  first  such  an  algorithm  was  developed  in  our  laboratory  for 
analysing  the  irregular  conformation  of  nucleic  acids  (refs. 1-2), 
and  soon  afterwards  the  necessary  adaptions  needed  to  treat 
proteins  were  made  (ref. 3)  yielding  the  algorithm  named  P-Curves. 

P-Curves  has  obvious  applications  for  describing  protein 
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folding  patterns*.  and  for  the  automatic  comparison  of  related 
proteins.  It  can  be  applied  to  any  protein  for  which  the  atomic 
coordinates  are  known,  from  crystallographic  data  banks  or  from 
molecular  mechanics  or  molecular  dynamics  studies.  In  this  paper 
we  will  describe  two  applications  of  the  method,  firstly,  an 
automatic  detection  of  secondary  structures  and  secondly,  an 
analysis  of  fine  differences  between  homologous  protein 
conf onnations . 

P-CURVES  METHODOLOGY 

We  start  by  presenting  briefly  the  P-Curves  algorithm.  The 
full  mathematical  details  of  which  can  be  found  in  our  earlier 
publications  (refs. 1-3) .  The  most  fundamental  aspect  of  the 
algorithm  we  have  developed  concerns  the  definition  of  an  axis, 
or,  more  precisely,  a  space  curve ^  which  will  describe  the  way  the 
molecule  studied  is  folded  in  3  dimensions.  In  the  case  of  a 
perfect  helix,  this  axis  is  a  rigorously  defined  straight  line, 
termed  the  helical  axis.  Every  monomer  has  the  same  position  and 
orientation  with  respect  to  its  local  axis  segment  and  each 
monomer  in  the  structure  can  be  reached  from  the  preceding  monomer 
by  a  fixed  rotation  coupled  with  a  fixed  translation  along  the 
axis. 

P-Curves  approach  is  a  natural  extension  of  this  idea  to  the 
case  of  irregular  systems.  Its  basis  is  the  definition  of  a 
function  which  describes  departure  from  perfect  helical  symmetry 
in  terms  of  the  curvature  and  dislocation  of  the  axis  describing 
the  polymer  and  in  terms  of  changes  in  the  position  of  successive 
monomers  with  respect  to  this  axis.  Minimisation  of  this  function 
yields  the  optimal  space  curve  describing  of  the  polymer  where 
both  types  of  irregularity  have  been  smoothed  in  a  least-squares 
sense.  Since  the  function  is  constructed  so  as  to  take  into 
account  simultaneously  the  position  of  all  the  monomeric  peptide 
units  making  up  the  polymer,  the  final  description  of  any  one  of 
these  units  depends  on  the  position  of  its  neighbors.  This  leads 
to  a  much  more  coherent  view  of  the  overall  conformation  than  that 
obtained  with  any  purely  local  parameters  such  as  the  backbone 
torsion . angles . 

After  optimisation  the  irregularity  of  each  pair  of  adjacent 
peptides  within  the  structure  can  be  measured,  to  yield  a  value 
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called  the  dimeric  irregularity  function  or  'DIF'.  The  DIF 
function  is  particularly  helpful  in  locating  protein  secondary 
structure  zones  without  having  to  make  reference  to  local 
properties  such  as  phi/psi  angles  or  hydrogen  bond  geometries. 

APPLICATIONS 

The  first  application  of  P-Curves  methodology  concerns  the 
automatic  detection  of  secondary  structure  zones  within  proteins. 
In  a  regular  helix  ("helix"  in  this  sense  includes  also  ribbon  and 
sheet  conformations)  the  DIF  values  are  zero.  In  order  to  detect 
secondary  structures  in  real  proteins,  some  degree  of  irregularity 
has  to  be  accepted.  The  limiting  DIF  values  for  segments  of 
different  length  were  consequently  determined  with  respect  to 
random  conformation  polypeptide  chains.  To  this  end  26  proteins 
containing  a  total  of  roughly  3000  peptides,  resolved  to  at  least 
1.8A,  were  selected  from  the  Brookhaven  Protein  Data  Bank  (ref. 
4) .  These  proteins  were  analysed  for  the  proportion  of  glycine  and 
proline.  20  random  conformation  polypeptides  were  then  constructed 
with  lengths  varying  between  100-400  peptides,  by  sampling  phi  and 
psi  angles  from  the  accessible  regions  of  the  Ramachandran  plot 
(refs.  5,6).  Glycine  and  proline  residues  were  located  randomly 
with  in  these  polypeptides  in  a  proportion  corresponding  to  that 
found  in  the  real  proteins  and  their  conformational  angles  were 
sampled  from  the  phi/psi  maps  correctly  describing  their 
respective  allowed  zones. 

The  random  polypeptides  created  in  this  way  were  subsequently 
analysed  by  the  P-Curves  program  and  maximum  DIF  values  were 
calculated  over  all  fragments  with  lengths  from  1  to  30  peptides 
long.  In  Fig  1  we  give  several  examples  of  the  distributions 
obtained  for  the  fragment  lengths  4 ,8, 12, and  16  peptides,  (the 
abcissa  gives  the  maximum  fragment  DIF  values  and  the  ordinate 
shows  their  relative  frequency) .  In  Fig  2  the  same  distributions 
are  shown  for  our  set  of  well-resolved  proteins.  Very  high 
frequencies  can  be  seen  in  the  region  of  small  DIF  values, 
corresponding  effectively  to  secondary  structure  zones.  From  these 
distributions  we  were  able  to  calculate  the  maximal  DIF  value 
which  would  correspond  to  the  existence  of  secondary  structure. 
Initial  trials  indicated  that  a  value  corresponding  to  5%  random 
probability  was  the  most  appropriate  choice. 
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Figure  1.  DIF  distributions  for  random  polypeptides.  Distributions 
are  shown  for  fragments  of  length:  4,8,12,16  peptides. 
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Figure  2.  DIF  distributions  for  true  proteins.  Distributions  are 
shown  for  fragments  of  length:  4,8,12,16  peptides. 


Having  determined  a  set  of  limiting  DIF  values  for  each 
fragment  length  it  is  now  possible  to  automatically  detect 
secondary  structure  regions.  In  order  to  distinguish  between  the 
different  types  of  structure  which  may  occur  we  simply  inspect  the 
helicoidal  parameters  of  the  peptides  in  each  zone.  We  have  shown 
(table  II,  ref. 3)  that,  with  the  exception  of  Xdisp,  all  the 
inter-peptide  and  peptide-axis  parameters  can  be  used  to  identify 
secondary  structure  types.  As  an  example,  Fig.  3  shows  the 
distribution  of  Ydisp  in  our  protein  set.  Two  distinct  peaks  are 
present,  the  first  one  centered  at  the  origin  corresponds  to  beta 
sheets,  while  the  second  at  1.5  A  are  a  helices.  The  small 
shoulder  at  1.1  A  corresponds  to  3/10  helices. 

N 


Figure  3.  Ydisp  distribution  for  the  chosen  set  of  well  resolved 
proteins. 

The  application  of  P-Curves  to  the  automatic  detection  of  the 
secondary  structures  is  illustrated  by  two  specific  proteins.  The 
first  one  is  erythrocruorin  from  chironomous  thummi  thummi  whose 
"fingerprint"  of  DIF  values  of  all  successive  dipeptides  is  shown 
in  fig  4.  This  figure  indicates  6  regions  of  secondary  structure, 
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which  were  all  identified  as  a  helices.  Fig  5  shows  our  so-called 
"ribbon  representation"  (ref.  3)  of  the  protein  structure.  The 
curved  line  represents  the  generalised  helical  axis  characterising 
its  folding  while  the  ribbon  shows  the  position  of  the  polypeptide 
backbone.  In  order  to  make  the  secondary  structures  more  visible 
the  ribbon  corresponding  to  irregular  segments  has  been  omitted. 
The  second  protein  presented  is  crystallin  from  the  bovine  eye 
lens.  Its  fingerprint  (fig  6)  shows  a  heavy  domination  of 
irregular  structure.  12  short  p  sheets  were  nevertheless  located. 
The  corresponding  ribbon  diagram  is  given  in  fig  7. 

□  IF 


Figure  8.  Superposition  of  DIF  values  for  ribonuclease  A  (dashed 
line)  and  S-protein  (full  line). 

The  other  application  of  P-curves  that  we  will  briefly 
describe  concerns  the  detection  of  fine  conformational  changes 
within  proteins.  The  example  we  present  concerns  the  effect  of 
cleaving  the  S-peptide  (the  first  20  residues)  from  ribonuclease 
A.  In  fig  8,  the  fingerprints  for  both  the  cleaved  and  the 
complete  structures  are  superposed,  the  full  line  corresponding  to 
the  S-protein,  the  dashed  line  to  the  full  protein.  It  can 
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immediately  be  seen  that  these  two  molecules  yield  almost 
identical  results.  The  only  differences  can  be  found  in  the 
vicinity  of  peptide  35  and  in  the  region  of  the  peptides  85-90. 
The  similarity  of  the  two  structures  is  confirmed  in  Fig  9,  where 
for  clarity  only  the  helical  axis  is  given.  The  two  conformations 
are  effectively  seen  to  be  similar,  however,  one  difference, 
consisting  of  a  hinge  motion  opening  outwards  the  loops  on  the 
right  hand  side  of  the  figure,  is  visible.  We  can  thus  see  that  P- 
Curves  analysis  and,  in  particular,  the  DIF  "fingerprint"  is  a 
powerful  way  of  localising  and  quantifying  global  conformational 
changes . 


Figure  9.  Helical  axis  for  Ribonuclease  A  (dashed  line)  and  S- 
protein  (full  line). 

CONCLUSIONS 

We  have  demonstrated  that  the  P-Curves  algorithm  can  be  used 
to  quantitatively  describe  protein  conformation.  The  algorithm 
leads  to  a  full  set  of  independent,  commuting  parameters  and,  most 
importantly,  to  a  global  axis  which  concisely  describes  the 
macromolecular  conformation.  An  automatic  detection  of  the 
secondary  structure  based  on  precise,  statistically  defined  rules, 
has  also  been  developed.  Overall,  the  method  is  useful  for 
comparing  the  structures  of  different  proteins  and  for  detecting 
and  defining  conformational  changes  upon,  for  example,  substrate 
binding.  It  is  also  certainly  of  interest  for  interpreting  dynamic 
simulations  of  proteins,  where  the  extraction  of  clear 
conformational  data  is  often  a  major  problem. 

P-Curves  is  currently  being  extended  to  the  analysis  of 
irregular  zones  such  as  the  beta  turns  (ref. 7)  and  to  the 
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definition  of  the  spatial  arrangement 
motifs  within  proteins. 
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secondary  structure 
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DISCUSSION 


DURUP-  1)  How,  i.e.  according, to  what  kind  of  sampling,  do  you  define  what 
you  used  as,  "random  polypeptides"  ? 

2)  Did  you  develop  a  method  for  describing  the  relative  positions  of 
two  helices,  i.e.  the  most  important  parameters  characterizing  a  loop  ? 

ZAKRZEWSKA  -  1)The  random  polypeptides  were  constructed  by  sampling  the 
phi/psi  torsion  angles  from  the  allowed  regions  of  the  Ramachandran  map.  Taking  into 
account  the  nature  of  the  peptide  involved  glycine,  proline  or  others. 

2)  I  am  working  on  this  problem  at  the  moment. 
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SUMMARY 

GENMOL  is  an  interactive  program  of  force  field  calculation  written  in 
Fortran  77  (-.12000  lines)  running  on  any  32  bits  computer  equipped  with 
a  graphic  terminal  ,  or  a  graphic  station  under  PHIGS.  The  program  is 
designed  to  build  any  molecule  from  a  few  atoms  to  hundreds  of  atoms. 
Its  fastness  (10  to  100  times  faster  than  MM2)  allows  one  to  perform 
conformational  analysis  at  each  step  of  the  molecular  building.  When 
refining  the  program  considers  the  total  strain  energy  and  the  local  strain 
energies.  The  n  systems  are  taken  into  account.  The  preferred 
conformations  of  macrocycles  are  generated  by  a  special  algorithm.  In  this 
work  an  application  of  the  program  is  given  with  the  purpose  to  decipher 
the  origin  of  psychotomic  or  sedative  activity  of  tricylic  antidepressant 
drugs  from  a  conformational  analysis. 


INTRODUCTION 

The  X-Ray  diffraction  is  the  standard  method  to  solve  the  3D 
structure  of  molecular  compounds.  Unfortunetaly  the  technique  is  very 
heavy  and  time  consuming,  and  not  always  conclusive.  GENMOL  was 
initially  designed  as  an  alternative  to  yeld  specific  structures  which  are 
expected  to  be  close  to  structure  variants  reported  yet. 

The  program  might  be  able  to  keep  rigid  the  common  part  of  the 
molecules  and  to  perform  conformational  analysis  on  the  varying  part  of 
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the  molecule.  Further  developments  of  the  program  led  us  to  model  re  t 
systems  as  to  find  the  most  probable  conformation  of  macrocycles.  f 

I 

In  a  previous  work  (ref.  1)  on  tricycle  neuroactive  compounds  } 

(Neuroleptics  and  antidepressants),  we  pointed  out  the  importance  of  the 
conformational  role  during  the  recognition  step  between  the  drug  and  the  j 
neurotransmitter  receptor.  * 

I 

The  results  of  the  calculations  performed  on  X-Ray  structures  led  us  ! 

to  admit  a  lock  and  key  interaction  model,  which  means  that  the  drug  f 

molecules  interact  with  the  receptor  without  conformational  fit.  As  * 

antidepressant  drugs  exhibit  psychotonic  or  sedative  effect  in  relation  f 

with  their  interaction  with  noradrenaline  or  serotonin  receptors,  in  the  * 

lock  and  key  model  the  preferred  conformation  of  the  psychotomic  drug  } 

must  fit  to  noradrenaline  receptors  while  the  preferred  one  of  sedative  f 

drugs  must  do  the  same  with  serotonin  receptors.  j 

1 

5 

In  order  to  gain  some  insight  into  this  reaction  mechanism,  GENMOL  I 

was  used  to  compute  and  to  compare  the  preferred  conformations  of  the  * 

antidepressant  drugs  (ref.  2)  and  those  of  the  corresponding  > 

neurotransmitters.  \ 

THE  PROGRAM 

The  program  is  designed  to  build  or  to  modify  molecules,  several 
libraries  of  molecular  fragments  and  of  molecules  are  available. 

The  principal  feature  of  the  program  is  the  concept  of  pivots  which  * 

are  all  the  interatomic  bonds  of  linear  chains  and  all  the  axis  passing  ' 

through  all  the  non  adjacent  atoms  of  the  cycles.  Molecular  deformations  * 

are  done  by  rotation  around  the  pivots,  when  running  GENMOL  in  the 
pivot  option,  bond  lengths  are  kept  fixed.  The  program  can  perform  a  i 

complete  Force  Field  Calculation  if  needed.  Parameters  for  stretching  and  4 

bending  potential  functions  are  derived  from  MM2  (ref.  3).  Those  for  van 
der  WAALS  and  Hydrogen  bond  potentials  are  issued  from  ECEPP  (ref.  4). 
Electrostatic  interactives  are  calculated  in  a  monopole  approximation  with 
net  atomic  charges  computed  with  : 

2 

1)  the  DEL  RE'S  method  (ref.  5)  for  the  c  part  with  a  new  set  of  atomic 
parameters  (ref.  6). 

2)  An  empirical  approximation  for  the  n  part  giving  equivalent  charges  to 
PARISER  and  PARR  ones  (rdf.  7). 
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Particularities  of  GENMOL 
*lntellieence 

-  Defines  the  hierarchy  of  rotations. 

-  Finds  the  nature  of  each  bond  : 

(4  types  of  bonds  :  single  bond,  double  bond,  linearly  conjugated 
double  bond  (X=X....X=X),  aromatic  and  triple  bond)  in  order  to  allow 
deformation  of  n  systems. 

-  Gives  the  preferred  conformations  of  a  molecule,  taking  into  account  the 
cycle  conformations. 

-  Corrects  automatically  the  input  errors  with  warning  messages. 

*  Fastness  :  (  10  to  100  times  more  rapid  than  MM2)  necessary  to  perform 
conformational  analysis. 

Automated  to  build  peptides  or  nucleic  acids  (DNA  and  RNA). 

*  Technical  charateristics  : 

.  Language  :  Fortran  77. 

.  Size  :  ~ 12000  lines. 

.  Computer  :  Any  32  bits  computer  equipped  with  a  graphic  terminal  or 
any  graphic  station  under  PHIGS. 

.  Size  of  built  molecules  :  3  atoms  to  hundreds  of  atoms. 


APPLICATION  TO  ANTIDEPRESSANT  DRUGS  AND  RESULTS  ; 

Antidepressant  drugs  block  preferentially  the  presynaptic  receptors 
of  noradrenaline  and  serotonin  (5HT  for  5-Hydroxytryptamine).  From  a 
statistical  point  of  view,  psychotonics  are  secondary  amines  acting  on 
noradrenergic  receptors  while  sedatives  are  tertiary  amines  acting  on 
serotoninergic  receptors  (ref.  8).  Let  us  notice  that  amine  substitution  has 
both  conformational  and  electronic  effects. 
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The  conformations  of  19  antidepressants  were  generated  (the 
conformations  with  a  differential  strain  energy  AES  lower  than  10  Kcal 
mole'1  where  kept  in  the  analysis,  AES  being  the  difference  of  strain 
energy  to  the  most  stable  conformation)  and  then  compared  to  the 
preferred  conformations  of  the  neurotransmitters.  The  total  numbers  of 
conformation  molecules  here  studied  and  their  basic  structure  origin 
(from  X-Ray  or  calculated)  are  gathered  in  Table  1). 

Table  1. 


Antidepressant  drugs  and  numbers  of  conformations  analysed  (XR  for 
structures  coming  from  X-Ray  and  CAL  for  calculated  ones)  in  a  scale 
going  from  psychotonic  to  sedative  effect. 


Compound 

Symbol 

N 

Origin 
of  the 
struc. 

Ref. 

Total 
nu.  of 
conform, 
with 

Esc  10 
(Kcal/mole) 

Noradrenaline 

NAD 

XR 

9 

7 

Serotonin 

SER 

- 

XR 

10 

8 

Histamine 

HIS 

- 

XR 

1  1 

9 

Amineptine 

AMI 

1 

XR 

12 

3 

Metapramine 

MET 

2 

CAL 

1 

Quinupramine 

QUI 

3 

XR 

13 

1 

Desipramine 

DES 

4 

CAL 

1  3 

Protriptyline 

PRO 

5 

CAL 

14 

Nortriptyline 

NOR 

6 

CAL 

1  5 

Imipramine 

IMI 

7 

XR 

14 

38 

Clomipramine 

CLO 

8 

XR 

1  5 

24 

Demexiptiline 

DEM 

9 

CAL 

27 

Propizepine 

PRP 

10 

CAL 

3 

Dosulepine 

DOS 

1  1 

CAL 

8 

Amoxapine 

AMO 

1  2 

CAL 

3 

Butriptyline 

BUT 

13 

CAL 

16 

Doxepine 

DOX 

14 

CAL 

10 

Opipramol 

OPI 

15 

CAL 

59 

Noxiptiline 

NOX 

16 

XR 

16 

20 

Amitriptyline 

AMT 

17 

CAL 

13 

Maprotiline 

MAP 

18 

CAL 

1  0 

Trimipramine 

TRI 

19 

CAL 

5 
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From  BOLTZMAN  statistics  in  the  accuracy  limits  of  our  calculations. 


|  and  at  human  body  temperature,  we  can  anticipate  the  presence  in  the 

|  synapse  of  only  one  conformation  for  noradrenaline  (ND1)  and  of  three 

|  conformations  for  5-HT  (SRI,  SR2,  SR3),  the  differential  strain  energies  for 

I  SR2  and  SR3  being  respectively  0.5  and  1.5  Kcal  mole-^. 


h' 


Table  2  gives  the  conformation  numbers  and  the  corresponding  strain 
energies  of  the  drug  that  superimpose  with  the  preferred  conformation  of 
the  neurotransmitters  (the  number  allows  to  identify  the  conformation  in 
order  to  see  if  it  is  the  same  that  inte-  ...is  with  the  different 
neutransmitter  receptors).  The  superimpositions  with  SR2  are  absent 
because  most  of  them  have  a  differential  strain  energy  greater  than  zero. 

Table  2  -  The  best  superimpositions  between  drug  conformations  and  the 
preferred  conformation  of  a  noradrenaline  and  5-HT  present  in  the 
synapse. 


Drugs 


Superimpositions 


5 

\  Symbol 

num. 

with  ND1 

with 

SRI 

withSR3 

l 

1 

i 

conf. 

strain 

conf. 

strain 

conf. 

strain 

j 

number 

energy 

number 

energy 

number 

energy 

1 

AMI 

1 

_ 

MET 

2 

2 

0.000 

2 

0.000 

_ 

_ 

| 

QUI 

3 

- 

- 

- 

- 

- 

- 

j 

DES 

4 

1  1 

1.70 

1  1 

1.70 

10 

0.00 

!  PRO 

5 

- 

- 

2 

2.80 

8 

0.80 

f 

S 

* 

NOR 

6 

14 

0.00 

1  4 

0.00 

13 

0.40 

i 

j 

IMI 

7 

3 

9.40 

8 

1.50 

12 

0.00 

s 

CLM 

8 

16 

6.10 

1  6 

6.10 

24 

0.00 

1 

\ 

DEM 

9 

9 

3.10 

9 

3.10 

27 

0.00 

t 

'4 

PRP 

10 

3 

0.00 

- 

- 

3 

0.00 

j 

DOS 

11 

8 

0.40 

- 

- 

_ 

j 

AMO 

12 

1 

0.00 

1 

0.00 

_ 

_ 

BUT 

13 

16 

0.00 

16 

0.00 

7 

4.70 

i 

DOX 

14 

10 

0.00 

- 

- 

3 

0.70 

t 

OPI 

15 

3 

6.90 

3 

6.90 

38 

0.00 

j 

< 

NOX 

16 

13 

8.50 

5 

0.00 

8 

1.50 

i 

> 

AMT 

17 

6 

0.00 

6 

0.00 

9 

1.60 

MAP 

18 

- 

- 

- 

- 

5 

1.60 

i  . 

TRI 

19 

4 

0.00 

- 

- 

3 

1.80 

:  j 
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The  significant  results  of  this  analysis  are  displayed  in  figure  \, 
where  the  superimposition  with  the  neuromediator  conformations  (ND1, 
SRI  and  SR3)  is  sketched  by  a  dash.  The  strain  energy  value  of  the  drug 
conformation  is  zero  or  very  close  to  zero. 


Figure  1. 

Diagram  of  superimpositions  between  the  most  stable  conformation  of 
the  antidepressants  and  the  neurotransmitter  conformations  present  in 
the  synapse  supporting  the  possibility  of  the  compounds  to  react  with  the 
corresponding  receptors  without  conformational  fit. 


SRI  - 
SR3  - 
ND1  - 


— u_! — j — i — !-_!—! — \ — u_j — \--\ — \ — \ — | 

1  2  3  4  5  6  7  8  9  10  11  12  13  14  15  16  17  18  19 


CONCLUSION  ON  THE  APPLICATION  : 

The  calculation  of  the  conformations  of  the  neurotransmitters  and 
of  the  tricyclic  antidepressant  drugs  with  GENMOL  and  their  comparison, 
allow  to  draw  some  meaningful  conclusions  regarding  interaction  of  the 
antidepressants  on  the  neurotransmitter  receptors. 
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,  Drug  molecules  and  neurotransmitters  generally  interact  in  their 
preferred  conformations  (without  conformational  fit)  with  the 
receptors. 

.  The  serotoninergic  receptors  belong  to  two  structurally  different 
families,  while  there  is  only  one  structural  family  for  the  noradrenergic 
receptors.  The  psychotonic  drugs  interact  on  one  family  of  these 
receptors  and  do  not  interact  on  the  noradrenaline  receptors,  while  the 
sedative  drugs  interact  on  the  other  family  of  5-HT  receptors  and  on  the 
noradrenaline  receptors. 
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DISCUSSION 


DEVILLERS  -  1°  Concerning  molecules  having  delocalized  %  electrons,  does  your 
program  perform  quantum  mechanichal  treatment  before  the  mechanical  one  ? 

2°  I  have  been  very  impressed  by  the  speed  you  mentioned  for  your 
program  :  up  to  100  times  faster  than  Molecular  Mechanics  2.  MM2  is  already  known  to 
be  fast  because  of  the  block-diagonal  Newton-Raphson  method  used.  So  it  brings  me 
to  ask  you  if  you  refine  the  geometry  by  changing  all  the  coordinates  of  every  atom  at 
each  step  ? 


PEPE-  1°  The  program  only  performs  force  field  calculations. 

2°  The  program  uses  non  classical  method  to  get  the  best 

conformations. 

.  It  goes  in  the  direction  of  the  best  conformations  from  topological 

informations. 


.  Then  when  the  preferred  conformation  is  found,  it  is  optimized 
only  by  considering  the  non  bonded  interactions  plus  the  torsional  energy  by  rotating 
around  bonds.  The  program  minimizes  all  the  local  energies  and  computes  the  total 
strain  energy  only  if  deformations  are  significant. 


DURUP  -  Just  a  warning.  Since  the  binding  of  the  messenger  to  its  target  aims  at 
inducing  some  process  in  the  target,  many  biochemists  consider  that  this  binding  may 
induce  some  strain,  which  will  be  released  mainly  in  the  transition  state  under 
consideration.  Therefore  such  possibilities  should  be  tested  by  the  model  calculation. 

PEPE  -  In  my  model,  I  only  consider  that  a  conformational  fit  will  increase  the  activation 
energy  which  will  slow  down  the  rate  of  interaction. 


KOZELKA  -  In  your  slides  you  have  shown  that  a  conformation  of . generated  by 

Genmol  fits  the  X-ray  structure  of  the  molecule,  whereas  a  conformation  coming  from 
force-field  calculation  does  not.  Does  this  not  simply  reflect  the  facts  that 

i)  the  X-ray  structure  is  one  of  those  the  "mean  value"  of  which  was  used  by  Genmol 
to  define  the  tricyclic  part  of  the  molecule  ; 

ii)  the  force-field  calculations  did  not  contain  crystal  packing  forces  ? 


PEPE  -  The  example  given  is  to  indicate  that  we  must  be  careful  about  geometry 
results  coming  from  force  field  calculations.  Generally  packing  interactions  are  not 
strong  enough  to  have  any  influence  on  the  geometry  of  rigid  parts  of  a  molecule,  that 
is  the  case  of  the  tricycles  of  the  antidepressant  drugs. 


SURCOUF  -  1°  In  your  examples,  you  have  long  chains  ;  you  have  performed 

conformational  analysis.  How  do  you  select  all  the  conformations  with  low  energy 
value  ? 

2°  Do  you  perform  energy  minimization  before  you  select  your 
conformations  of  lowest  energy  ? 


PEPE  1°The  program  generates  all  the  conformations  of  a  molecule  in  a 

range  of  given  strain  energy,  most  of  the  wrong  conformations  are  eliminated  from 
topological  conformations. 

2°  In  the  first  step  of  calculations  we  select  the  regions  of  the  preferred 
conformation,  only  the  less  strained  conformations  (whose  differential  strain  energy  is 
lower  than  a  given  value,  generally  10  k  cal  mole*1)  are  refined. 


Modelling  of  Molecular  Structures  and  Properties.  Proceedings  of  an  International  Meeting, 
Nancy,  France,  11-15  September  1989,  J.-L.  Rivail  (Ed.) 

Studies  in  Physical  and  Theoretical  Chemistry,  Volume  71,  pages  103-117 
©  1990  Elsevier  Science  Publishers  B.V.,  Amsterdam  —  Printed  in  The  Netherlands 


THE  FREE  ENERGY  OF  INTERCALATION:  THE  STRUCTURE  OF 
GRAPHITE  INTERCALATION  COMPOUNDS 

Zhuo-Min  Chen,  Omar  A.  Karim,*  and  B.  Montgomery  Pettitt 

Department  of  Chemistry,  University  of  Houston,  Houston,  TX  77204-5641 

‘Present  address:  Chemistry  Dept.,  University  of  North  Carolina  at  Wilming¬ 
ton,  Wilmington,  NC 

ABSTRACT 


A  statistical  mechanical  theory  is  developed  and  applied  to  study  the 
structural  effects  that  the  thermodynamic  state  of  alkali  ions  and  small 
diatomic  molecules  have  on  graphite  intercalation  compounds.  The  sys¬ 
tems  considered  are  that  of  second  stage  Rb-graphite  and  02 -graphite.  Two- 
dimensional  diffraction  patterns  are  computed  and  compared  with  experimen¬ 
tal  measurements.  Sensitivity  to  model  parameters  is  considered.  A  low  order 
density  functional  expansion  is  found  to  adequately  describe  the  structure  of 
the  system  modeled  as  a  two-dimensional  one-component  fluid  in  an  anisotropic 
external  field. 

I.  INTRODUCTION 


Knowledge  of  the  interesting  physical  and  chemical  properties  of  the 
intercalation  compounds  of  graphite  has  recently  multiplied  many-fold  due 
to  a  renewed  experimental  interest.1  Using  the  techniques  of  statistical 
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mechanics,2-4  theories, have  been  developed  and  applied  to  study  the  effects 
that  ions  and  molecules  have  on  the  structure  and  thermodynamics  of  graphite 
intercalation  compounds.  These  are  theories  of  the  free  energy  or,  equiva¬ 
lently,  distribution  functions  for  anisotropic  liquids  and  bear  a  mathemat¬ 
ical  resemblance  to  current  theories  of  freezing  and  crystallization.  For 
the  graphite  systems,  a  theoretical  understanding  of  the  phenomena  of  staging 
(layering)  and  the  concomitant  phase  transitions  from  one  stage  to  another 
involves  interlayer  distribution  functions.  As  a  prelude,  the  details  of  the 
interionic  and  intermolecular  distributions  within  the  graphite  layers  must 
first  be  calculated  before  the  multiple  layer  problems  may  be  approached  at 
an  atomic  level  of  description. 

Recent  work  in  this  area,  from  our  laboratory,  has  been  devoted  to  devel¬ 
oping  and  implementing  new  theoretical  methods  for  exploring  the  equilibrium 
properties  of  graphite  intercalation  compounds  which  will  be  useful  in  under¬ 
standing  the  relation  between  the  structure  and  the  associated  physical  prop¬ 
erties  of  these  compounds,  especially  the  observed  catalytic  activity.5  An 
accurate  structural  theory  will  allow  the  calculation  of  the  thermodynamics 
of  graphite  intercalation  compounds  and  of  how  free  energy  differences  and 
ion  or  molecule  solubilities  are  related  to  the  registration  of  the  graphite 
layers . 

A  detailed  understanding  of  these  facets  of  lamellar  graphite  compounds 
is  of  fundamental  importance  to  our  understanding  of  the  material  properties 
for  which  such  compounds  are  useful.  These  materials  are  important  for  a  num¬ 
ber  of  energy  related  systems  such  as  batteries  and,  in  addition,  detailed 
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knowledge  of  the  structure  would  aid  in  our  understanding  of  the  modulation 
and  regulation  of  a  number  of  organic  reactions  which  these  materials  cata¬ 
lyze.  It  is  widely  recognized  that  alkali  graphite  compounds  are  active  as 
hydrogenation  catalysts  for  unsaturated  hydrocarbons.5  These  materials  are 
also  known  to  catalyze  polymerization  in  a  variety  of  unsaturated  organic 
compounds.6  The  most  interesting  structural  aspect  of  these  reactions  is  that 
the  alkali  metal  cations  are  thought  to  be  one  of  the  major  centers  of  the 
catalytic  activity.  Thus,  large  multiatomic  reactant  molecules  must  inter¬ 
calate  between  the  layers  of  graphite. 

Synthetic  metals  of  graphite  are  made  by  incorporating  alkali  atoms  be¬ 
tween  certain  layers  of  a  graphite  lattice.  Such  materials  display  a  rich 
variety  of  physical  properties  which  are  quite  distinct  from  the  ordinary 
or  true  metals.  These  compounds  have  a  complicated  phase  diagram  due  to  the 
phenomenon  of  staging.1'7  A  stage-n  compound  is  defined  as  having  n  interven¬ 
ing  layers  of  graphite  between  the  layers  of  alkali  metal  atoms.  A  number  of 
properties  vary  sharply  with  a  change  in  stage  such  as  band  structure,  con¬ 
ductivity  (and  superconductivity),  specific  volume  and  catalytic  activity. 

An  accurate  theory  of  the  microscopic  structure  used  with  classical  sta¬ 
tistical  thermodynamics  would  yield  a  precise  picture  of  the  free  energy  or 
chemical  potential  as  a  function  of  the  geometric  and  interaction  parameters 
of  the  system.  We  shall  present  ways  to  formulate  and  use  density  functional 
and  integral  equation  theories  of  the  structure  to  predict  many  aspects  of 
these  materials’  thermodynamic  properties  based  on  the  coordinate  dependent 
free  energy  or  the  distribution  functions  for  the  system. 
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Below,  we  briefly  outline  the  theory  used  to  find  the  distribution  of  in- 
tercalant  atoms  in  both  atomic  and  molecular  systems.  Examples  then  follow. 

II.  SIMPLE  FLUID  THEORY 


To  make  the  calculations  more  tractable,  we  first  use  the  symmetry  of  the 
system  to  reduce  the  three-dimensional  problem  to  a  two-dimensional  one.  The 
model  we  use  for  the  layers  of  graphite  which  act  on  the  intercalated  fluid 
of  ions,  atoms  or  molecules  is  a  system  composed  of  a  two-dimensional  fluid 
in  the  presence  of  an  external  potential.  We  have  chosen  to  model  the  to¬ 
tal  potential  energy  for  the  system,  as  consisting  of  a  sum  of  one- 

body  and  two-body  terms.  The  one-body  potential,  contains  the  two- 

dimensional  periodically  varying  interaction  of  the  graphite  layers  act¬ 
ing  on  the  ith  ion.  The  two-body  part  of  the  potential  energy  of  interaction, 
r;-),  represents  the  effective  interactions  between  the  intercalant 
species  in  the  system.  The  total  model  potential  energy  of  the  system  is, 
thereby,  taken  to  be 


>  '>i 

where  the  position  of  atom  (ion)  i  is  denoted  r(,  and  {r,-}  represents  the  col¬ 
lection  of  all  coordinates  r{.  In  this  equation,  the  summations  extend  over 
all  ions  in  the  two-dimensional  system. 

Consistent  with  experimental  observations,8'9  we  employ  a  low  order 
Fourier  expansion  of  U^\r;)  and  choose  a  two-dimensional  cartesian  coordi¬ 
nate  system,  ( x,y ), 


y(  "(n)  =  2K 1 2co,  [  ( y)  •] "“  [  (x)  * 


+  cos 


-  ..  ; . 
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Here,  the  lattice  constant  is  given- by  a  =  \fZre  where  re  is  the  equilibrium 
bond  distance  between  graphitic  carbon  centers. 

The  choice  of  U ^  is  governed  by  the  choice  of  intercalant  fluid  and 
by  the  nature  of  the  electronic  interactions  between  the  intercalant  and 
the  graphite  sheets,  giving  rise  to  possible  intra-layer  screening  in  the 
case  of  ionic  species.  In  the  case  of  alkali  atoms,  evidence  from  electronic 
structure  calculations  and  other  sources  indicates  that  there  is  an  electron 
transfer  from  the  alkali  to  the  graphite. 12-15  We  have  taken  the  two-body  in¬ 
teraction  between  ions  to  be  that  of  a  screened  Coulomb  form. 

U(2){ti,t2)  -  qiq2exp(-K  |  -r2  |)  |  tl  —  r2  |-1  3a) 

As  all  of  the  ions  are  of  the  same  charge,  q,  this  potential  is  purely  repul¬ 
sive.  For  realistic  choices  of  the  screening  parameter,  n,  for  the  ions  no 
hard-core  nor  Lennard-Jones  r~u  type  of  repulsion  was  found  necessary  for 
our  qualitative  level  of  comparison  with  experiment . 

We  must  extend  this  to  the  molecular  intercalant  as  well.  In  principle, 
our  total  energy  could  have  more  terms  which  express  the  chemical  bond  en¬ 
ergy.  Alternatively,  one  might  assume  that  the  bond  energy  could  be  included 
into  the  two-body  part  of  potential.  In  such  a  case,  the  atomic  density  equa¬ 
tions  need  not  be  modified  at  all.  However,  one  would  then  have  to  deal  with 
the  well-known  problem  of  the  different  magnitudes  of  the  mean  forces  respon¬ 
sible  for  chemical  bonding  as  opposed  to  physical  packing.  Alternatively, 
we  assume  rigid  molecules  using  an  ansatz  similar  to  the  RISM  theories10  for 
molecular  fluids  and  then  derive  an  equation  which  is  consistent  with  this 
molecular  theory  for  the  singlet  distribution  of  atoms. 
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For  the  interactions  between  molecules  a  two-site  Lennard- Jones  poten¬ 
tial  surface  was  employed. 


9*9i 


36) 


Here,  r,-y,  eij,  and  <r,2 ,  are  the  distance  between  atoms  i  and  j ,  the  Lennard- 
Jones  well-depth  and  diameter,  for  the  atom  pair  i  and  j .  The  diatomic  in¬ 
tramolecular  potential  was  taken  as  a  simple  rigid  bond,  which,  while  conve¬ 
nient,  is  not  crucial  to  the  analysis  that  follows  and  could  be  replaced  with 
a  more  realistic  potential  form. 

The  most  graphic  data  on  the  structure  of  graphite  intercalation  com¬ 
plexes  are  given  by  recent  diffraction  experiments.8  Comparison  requires 
only  the  singlet  and  pair  density  distributions.  While  this  sort  of  infor¬ 
mation  is  available  from  hierarchal  equations,11  such  an  approach  would  re¬ 
quire  an  approximate  closure  or  truncation  of  the  hierarchy.  Instead,  the 
one-body  distribution  and  the  pair  correlations  may  be  approximated  by  si¬ 
multaneously  solving  the  anisotropic  Ornstein-Zernike  (OZ)  equation  with  an 
approximate  closure  expression  and  an  equation  that  relates  the  singlet  dis¬ 
tribution  function  to  the  pair  distribution  function.  In  full  form,  this  is 
not  completely  feasible. 

For  an  anisotropic  system  we  may  write  the  OZ  equation  as: 

*<2)(ni,2:2)  =  c(2)(r,,r2)  +  J  c(2)(rur3)p(r3)h{:!\r3,r2)dr3  4) 

where  is  the  OZ  direct  correlation  function  and  may  be  taken  to  be  defined 
by  this  equation.  The  pair  correlation  function  is  denoted  by  M2^  and  p  is  the 
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r-dependent  singlet  density  distribution  function.  A  wide  variety  of  approx¬ 
imate  closure  relations  may  be  used  with  OZ-like  equations.  The  functional 
form  of  the  pair  potentials  chosen  make  hypernetted  chain  (HNC)12  closure  a 
superior  choice  over  alternatives  such  as  the  Perkus-Yevick  (PY)  relation.11 

The  anisotropic  HNC  closure  is  given  by 

e(2)(£i,Ea)  =  exp[-PU{2\rur2)  +  /i(2)(r,,r2)  -  c(2)(z;j,r2)] 


5) 


-  +  c(2)(rj,r2)  -  1 

P  =  1  jksT  with  ks  the  Boltzmann  constant  and  T  the  absolute  temperature. 
The  0Z  relation  (eq.  4)  and  the  HNC  closure  (eq.  5)  thus  give  an  approximate 
representation  of  the  pair  correlations  in  an  anisotropic  fluid  given  a  non¬ 
constant  p(r). 

Calculation  for  a  non-isotropic  fluid  requires  a  nontrivial  (e.g. ,  con¬ 
stant)  singlet  density  distribution  and  so  we  may  start  with  the  exact  Lovet- 
Mou-Buff  equation13 


V i{ln\p{?.i)}  +  PU{1){rt)}  =  -  J p(r2)V2c(2)(r,,r2)dr2 


6) 


In  principle,  eqs.  4,  5  and  6  form  a  closed  set  of  relations  and  can  be  solved 
for  the  one-  and  two-particle  correlation  functions.  Such  a  solution  is  quite 
computer  memory  intensive  and  time  consuming.  In  order  to  circumvent  these 
numerical  difficulties,  we  consider  a  perturbation  expansion  of  c^(rj,r2) 
using  an  isotropic  reference  system.  We  denote  quantities  relating  to  the 
reference  system  with  a  subscript  ’O’ .  The  reference  system  is  an  unmodulated 
liquid  with  a  density  p0  identical  to  the  bulk  density  of  the  modulated  liq¬ 
uid.  Hence,  p(r)  must  satisfy 


V 


=  P  o 


7) 


f<-] 


if,'-' 
t  **, 


no 

where  V  is  the  volume  (or  area)  of  integration. 

He  consider  an  expansion  of  the  direct  correlations  about  that  of  the 
reference  system.14’15  In  terms  of  the  isotropic  reference  system,  the  func¬ 
tional  Taylor  expansion  of  the  direct  correlation  function  is 

c(2)(Ei)2l2)  =  <%\rn)  +  J  ci3\r1,r2,r3)[p(r3)-p0]dr3+...  8) 

To  form  a  solution,  we  neglect  higher  order  terms  involving  integrals  over 
c^3HliiL2>Z3) ’  *tc.  Such  an  approximation  is  not  unlike  that  used  in  modern 
density  functional  theories  of  freezing. 15-17  In  such  theories  of  freezing, 
the  two-particle  direct  correlation  function  for  the  solid  is  approximated  by 
the  corresponding  quantity  for  the  coexisting  liquid.  The  difference  in  us¬ 
ing  this  truncated  expansion  here  versus  that  in  the  theories  of  freezing  is 
that,  here,  the  bulk  density  and  thermodynamic  phase  of  the  reference  system 
and  the  modulated  system  are  necessarily  the  same.  Thus,  substituting 

c(2)(r,,r2)  ~  c(02)(r12) 


into  eq.  6  we  obtain 

Vi{/n(p(j;,))+^CI(1)(r1)}  J  p(z2)V2C(2)(ri2)dr2 

He  now  invoke  the  identity 


9) 


V2cl2)(ri2)  =  -Vi#;(r,2) 

and  substitute  into  eq.  9  to  find,  upon  interchanging  integration  and  differ¬ 
entiation,  ■ 

^7i{Mp(£1)]  +  /?y(,)(i:i)}  =  v'  J i°) 


Ill 


J  ^ 


0 
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where  logz*(r )  may  be  related  to  an  external  activity  or  potential  field,  and 
5(1,2)  =  ^(Irj  —  r2 1).  For  the  multicomponent  fluid  case,  we  define  : 

Spiin) 


=  Pi{r\)Sij{rur2)  +  Pi(ri)pj(r2)hi}(rur2) 


15) 


SlogZj{r2) 

where  the  notational  extension  is  obvious.  Here,  we  introduce  the  possi¬ 
bility  of  rigid  (covalent)  chemical  bonding  in  a  neat  fluid  via  1,2)  = 

5(|1  —  2|  —  /)  with  the  convention  that  when  i  =  j, l  =  0;  otherwise,  if  there 
is  a  chemical  bond,  l  =  lol  the  chemical  bond  length  between  atoms  (or  sites)  i 
and  j .  Since  p;(r)  is  a  function  of  both  logz^(r)  and  logZj(r),  we  have: 

8logpi(l)  =  8logzx '(1)  +  J  Sij(l,2)8logz'j(2)d2+ 

J  ha(l,2)p,(2)8logzf(2)d2  +  J  hxj{l,2)pj{2)Slogz){2)d2.  16) 

We  now  let  an  overline  denote  a  vector  and  square  brackets  represent  a  matrix. 
Since  the  Fourier  transform  and  the  functional  derivative  operation  commute 
for  well-behaved  distributions,  we  have 


Slogp  =  Slogz *  +  (s)  *  Slogz *  +  [A]  *  \p\6logz* 


17) 


where  (sj  is  the  bond  or  intramolecular  correlation  function  matrix,10  and  \p\ 
is  a  diagonal  matrix.  Here,  *  denotes  a  matrix  convolution  product.  From  here 
we  may  use  the  fact  that  the  third  term  is  of  higher  order  in  p,  which  is  small 
and  can  bo  approximated  in  a  mean  field  fashion  by  its  average.  This  leads  di¬ 
rectly  to: 

13) 


Slogp  =  (/  +  js]  -f  Po[h})  *  Slogz * 


=  (M  +  Po(/i])  *  Slogz* 


19) 
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where  I  +  [s]  =  [tu]  and  I  is  the  identity  matrix. 

From  here  we  use  the  OZ-like  RISM  expression10  to  introduce  the  isotropic 
molecular  site-site  pair  correlations: 

[h]  =  [w]  *  [cj  *  (w)  +  p0\w\  *  [cj  *  [/ij  20) 

After  an  integration  by  parts  of  eq.  11  and  using  eq.  14,  the  molecular  den¬ 
sity  equation  in  matrix  form  is  given  by: 

p  —  A  exp(—(3Ui(r)  +  [tu]  *  [c]  *  p)  21) 

where  A  is  the  normalization  constant  fixed  by  the  density  as  in  the  atomic 
case. 

Again,  we  may  use  the  HNC  equation  in  matrix  element  form.12 

IV.  RESULTS 

Briefly,  we  present  a  comparison  of  the  results  for  atomic  (ionic)  inter- 
calant  fluids  with  that  for  diatomic  fluids.  The  predicted  S{kz,ky)  for  the 
atomic  case  corresponding  to  second  stage  Rb-graphite  is  shown  below. 


The  periodic  density  waves  in  the  Rb  ions  due  to  the  modulation  by  the 
graphite  are  evident  in  the  delta  function-like  features.  Interference  of 
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the  simple  liquid  structure  or  Debye-Scherer  rings  with  the  density  waves 
causes  the  halos  about  each  of  the  sharp  density  wave  features.  The  size  and 
placement  of  these  features  is  in  excellent  agreement  with  those  measured 
experimentally . 8 


In  the  case  of  diatomics,  there  is  another  length  scale  in  the  problem. 
Besides  the  size  of  the  intercalant  atoms  and  the  underlying  graphite  lat¬ 
tice  there  is  the  length  of  the  bond  in  the  diatomic  intercalating  fluid. 

This  extra  length  scale  competes  with  the  scales  defined  in  the  previous  pic¬ 
ture  to  change  the  overall  intensity  pattern.  For  physically  reasonable  bond 
lengths,  we  find  that  this  competition  has  a  dramatic  effect  on  the  predicted 
intensity  pattern.18  Below,  we  show  the  results  for  a  system  corresponding  to 
intercalated  Oi . 


The 
seen  in 
overall 


presence  of  the  other  length  scale  is  seen  in  the  new  features  not 
the  previous  picture,  as  well  as  in  the  diminution  of  the  features 
.  The  reduction  in  intensity  is  primarily  due  to  destructive  inter¬ 


ference  in  the  density  waves  for  the  fluid  atoms  of  the  system. 
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CONCLUSIONS 

The  coordinate  dependent  free  energies  and  the  concomitant  density  dis¬ 
tributions  for  two  physically  different  intercalating  fluids  have  been  cal¬ 
culated  and  compared.  We  predict  that  the  extra  length  scales  found  in  some 
molecular  systems  will  diminish  the  intensity  patterns,  even  for  quite  dense 
fluids.  This  has  been  analyzed  and  found  to  be  due  to  a  competition  of  the 
features  due  to  the  length  scale  responsible  for  chemical  bonding  with  those 
responsible  for  packing  and  the  graphite  induced  density  waves. 

Our  approach  has  been  based  on  density  functional  theory  using  integral 
equations  as  input.  While  such  methods  are  good  for  structure,  dynamics  can 
not  be  studied  in  this  way.  In  future  work,  we  shall  employ  constant  energy 
molecular  dynamics  and  constant  chemical  potential  dynamics  to  further  study 
these  systems  structurally  as  well  as  dynamically. 
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DISCUSSION 


SOUMPASIS  -  Do  you  assume  that  the  2-D  liquid  'sheets’  are  completely  decoupled 
through  the  graphite  layers  ? 

PETTITT  -  No,  but  that  information  has  been  integrated  out  of  the  picture.  Thus,  it 
appears  only  in  the  one-body  potential  we  use.  The  potential  was,  however,  obtained 
for  2nd  stage  Rb+  -  graphite  (which  has  a  specific  coupling  or  staging  of  the  layers) 
from  the  intensity  of  the  Bragg  spots  induced  by  the  liquid  density  waves.  While 
averaged  effects  are  there,  we  cannot  ask  specific  questions  about  correlations 
between  layers  with  our  current  model. 


BUCKINGHAM  -  Is  there  a  threshold  in  the  concentration  of  alkali  atoms  in  the  graphite 
intercalation  materials  ?  If  the  layers  of  carbon  atoms  remain  approximately  parallel, 
then  it  would  seem  that  there  will  be  a  threshold  concentration  of  intercalate. 

PETTITT  -  Yes,  there  are  in  fact  several  thresholds.  These  are,  again,  related  to  this 
phenomena  of  staging.  There  is  a  phase  transition  from  one  stage  to  another  and  each 
new  stage  brings  with  it  a  new  set  of  physical  properties  such  as  color,  conductivity, 
etc. 


MONQUE  -  Because  in  your  model  you  talk  about  the  "tailored  layer  expansion  factor" 
and  about  the  way  you  can  calculate  it,  you've  made  me  think  on  its  application  for  the 
synthesis  or  preparation  of  pillared  clays  (catalytic  material  built  with  some  different 
cations  in  interlayer  position  with  specific  shape  selectivity  as  a  function  of  the 
interlayer  space).  So  can  you  please  tell  me  how  can  I  use  this  model  experimentally 
on  the  synthesis  of  this  type  of  materials  ? 

PETTITT  -  Pillared  clays  and  some  zeolites  are  certainly  good  examples  of  solid 
supports  with  liquid  accessibility  for  catalysis.  Our  understanding  of  even  these  simple 
graphite  intercalation  systems  is  not  yet  sufficient  to  make  thermodynamic  suggestions. 
Hopefully,  we  shall  someday  understand  the  layer  spacing  free  energy  well  enough  to 
make  such  suggestions. 
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SUMMARY 

The  lecture  highlights  the  efficiency  of  the  “back-of-the-envelope”  BOC-MP 
(bond-order  conservation  Morse-potential)  model  in  calculating  reaction  energetics  on 
metal  surfaces,  particularly  heats  of  adsorbate  chemisorption  Q  and  activation  barriers 
AE*  for  adsorbate  dissociation  and  recombination.  After  representative  calculations  of 
Q  and  AE*  for  a  variety  of  diatomic  and  polyatomic  admolecules,  we  discuss  an 
example  of  a  complex  heterogeneous  catalytic  process  —  transformations  of  C2 
hydrocarbons  on  transition-metal  surfaces.  It  appears  that  the  BOC-MP  modeling 
provides  an  effective  means  for  understanding  and  projecting  surface  reactivity. 


INTRODUCTION 

The  worldwide  value  of  products  made  via  catalytic  technology  is  in  excess  of 
one  trillion  dollars  per  year,  and  heterogeneous  catalysis  leads  the  way.  This 
tremendous  economic  impact  is  the  major  thrust  of  intensive  efforts  in  both  industry 
and  academia  to  understand  heterogeneous  catalytic  processes  at  the  molecular  level. 
Understanding  mechanisms  of  chemical  reactions  is  impossible  without  knowledge  of 
reaction  energetics.  Ultimately,  this  energetics  should  be  calculated  by 
quantum-mechanical  techniques.  So  far,  however,  the  progress  has  been  impressive  for 
gas-phase  reactions  but  rather  modest  for  surface  reactions. 

While  waiting  for  the  advent  of  efficient  quantum-mechanical  techniques, 
common  sense  requires  one  to  look  for  working  alternatives.  If  not  microscopic 
modeling,  let  it  be  phenomenological.  To  have  a  chance  to  succeed,  such  a 
phenomenological  model  should  have  a  rigorous  mathematical  framework  and  make 
use  of  well-defined  parameters,  preferably  only  observable  ones.  In  order  to  be 
practical,  theoretical  constructs  should  be  simple  enough,  ideally  at  a 
“back-of-the-envelope”  level. 

During  the  last  years,  we  have  been  developing  such  a  phenomenological 
construct,  namely,  the  bond-order  conservation  Morse-potential  (BOC-MP)  model 
(refs.  1-3).  The  model  is  based  on  four  assumptions: 

(1)  Each  two-center  metal-adsorbate  M-A  interaction  is  described  by  the  Morse 
potential 
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E(x)  =  -Q(x)  =  -Q0(2x  -  X2)  (1) 

where  x  is  the  M-A  bond  order, 

x  =  exp[-(r  -  ro)/a]  (2) 

an  exponential  function  of  the  M-A  distance  r  (ro  and  a  are  constants),  and  Qo  is  the 
M-A  equilibrium  bond  energy.  Obviously,  the  total  energy  E(x)  (Eq.  1)  has  only  one 
minimum  at  the  equilibrium  distance  ro  when  the  bond  order  x  =  1,  by  definition 
(Eq.  2). 

(2)  For  an  adatom  A  interacting  with  n  metal  atoms  M  (M„-A),  n  two-center 
M-A  interactions  are  additive. 

(3)  For  a  given  Mn-A,  n  is  limited  to  nearest  neighbors. 

(4)  For  X  =  A  or  AB,  along  a  migration  path  up  to  dissociation,  the  total  Mn-X 
bond  order  is  conserved  and  normalized  to  unity. 

The  assumptions  (1)  -  (4)  are  the  rules  of  the  game.  The  rest  is  straightforward 
algebra  of  maximizing  the  total  Mn-X  bond  energy  under  the  BOC  condition  (4)  the 
analytic  form  of  which  depends  on  the  nature  of  X  and  the  geometry  of  Mn . 

The  aim  of  the  present  lecture  is  to  give  a  flavor  of  how  the  BOC-MP  model 
treats  the  energetics  of  diatomic  and  polyatomic  adsorbates  on  transition-metal 
surfaces,  particularly  the  heat  of  chemisorption  Q  and  activation  barriers  AE*  for 
dissociation  and  recombination.  Because  of  the  space  constraints,  the  analytic  formulas 
used  to  calculate  Q  and  AE*  will  be  given  with  minimal  explanations,  and  only  a  few 
sources  of  the  experimental  data  employed  will  be  cited.  For  a  detailed  discussion  and 
a  complete  list  of  references,  the  reader  is  referred  to  the  review  (ref.  3). 

THE  BOC-MP  FORMALISM 
Heats  of  Chemisorption 

For  atomic  chemisorption,  the  Mn-A  bond  energy  Qn  monotonically  increases 
with  n  as 

QA  =  Qn  =  Q0A(2-l/n)  (3> 

where  Qq^  is  the  maximum  M-A  two-center  bond  energy  (cf.  Eq.  1).  The  value  of 
Q„  reaches  the  absolute  maximum  in  the  hollow  n-fold  site,  so  that  the  observed  heat 
of  atomic  chemisorption  can  be  identified  with  Qn. 

For  molecular  chemisorption,  one  should  distinguish  between  weak  and  strong 
Mn-AB  bonding.  The  weakly  bound  AB  molecules  typically  have  a  closed  electronic 
shell,  e.g.,  H2,  N2,  CO,  NH3,  H2O,  or  unpaired  electrons  occupying  the  substantially 
delocalized  molecular  orbitals,  e.g.,  NO  or  O2.  The  strongly  bound  AB  molecules  have 
unpaired  electrons  retaining  their  atomic  character,  e.g.,  CPI,  CH2,  NH,  OH,  or  OCH3. 


**-..>4?.- *"*>+•' 


Analytically,  the  difference  between  the  weak  and  strong  Mn-AB  bonding  is  reflected 
in  the  use  of  different  effective  Morse  constants,  Qq^  and  Qa,  respectively. 

For  the  weak  Mn-AB  bonding,  the  simplest  case  corresponds  to  AB 
perpendicular  to  a  surface  with  the  A  end  down  when,  to  first  approximation,  the 
Mn-B  bond  order  can  be  neglected.  For  such  mono(V)  coordination  via  A  with  Mn 
(Vnn).  the  variational  procedure  gives 

2 

QAB.n  £  7 TT-yrTT-  for  °AB  >—  Qoa 
(Q0A/n)  +  DAB  n 

If  AB  is  coordinated  via  both  A  and  B  (dicoordination  T|2),  the  bonding  energy 
becomes 


Qab  = 


ab(a  +  b)  +  DAB(a  -  b)2 
ab  +  DAB(a  +  b) 


a  =  Qqa(Qoa  +  2Qob)/(Qoa  +  Qob)2 


b  -  Qob  (Qob  +  2Qoa)/(Qoa  +  Qob)2 

For  a  homonuclear  A2  (a  =  b  =  (3/4)Qq^>,  Eq.  5  reduces  to 

O  (9/2)Qqa 
^A2“3Q0a  +  8Da2 

For  the  strong  Mn-AB  bonding  in  the  monocoordination  qVn>  the  variational 
procedure  now  leads  to  an  expression 


Qa  +  Dab  ( 

which  is  an  analog  of  Eq.  4  (for  n  =  1)  where  Qq^  is  substituted  by  QA.  Along  with 
the  weak  and  strong  Mn-AB  bonding,  one  can  imagine  the  intermediate  one,  which 
may  be  described  by  interpolating  between  the  two  extremes.  In  particular,  for  the 
monocoordination  (nVn)  M„-AB,  one  can  simply  average  Eqs.  4  and  7  as 


Qab  =  1/2 


.(QoaAO  +  Dab  Qa  +  Dab. 
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The  intermediate  Mn-AB  bond  strength  is  expected  for  monovalent  radicals  AB  where 
A  is  a  tri-  or  tetravalent  atom,  say  N  or  C  in  NH2,  CH3,  HCO. 

Dissociation  and  Recombination  Barriers 

If  AB  approaches  a  surface  from  the  gas  phase,  the  multidimensional  activation 
barrier  AEAB.g  for  dissociation  ABg  -+  As  +  Bs  explicitly  depends  on  the 
chemisorption  energies  of  all  adsorbates,  namely,  the  barrier  can  be  approximated  as 


AEab,8  =  l/2(AEAB,g  -  Qab) 

where  the  one-dimensional  barrier  AEAB.g,  rigorously  defined  by  variational 
procedure,  is 


AE;'B>g  =  DAB-(QA  +  QB)  +  -^^  (; 

Obviously,  the  dissociation  barrier  AEab.s  from  a  chemisorbed  state  will  be  larger 
than  AEAB.g  (one-  or  multidimensional)  just  by  the  amount  of  the  molecular  heat  of 
chemisorption  Qab: 


AEab.s  =  AAB,g  + Qab  (1!) 

Combining  with  Eq.  9,  we  obtain 

AEab.s  =  l/2(AEAB,g  +  Qab)  (12) 

In  general,  for  the  reverse  reaction  of  recombination,  the  activation  barriers  can 
be  calculated  from  the  relevant  thermodynamic  relations.  Specifically,  for  the 
recombination  of  chemisorbed  As  and  Bs  to  chemisorbed  ABS  or  gas-phase  ABg, 
the  activation  barriers  AEa-b.s  and  AEA-B.g  may  be  the  same  or  different  depending 
on  the  sign  of  the  gas-phase  dissociation  barrier  AEAB.g  ,  namely, 


AEa-b.s  =  AEA-B.g  =  Qa  +  Qb  -  Dab  +  AEAB.g 
or 

AEA-B.g  =  AEa-b.s  -  AEAB.g  =  Qa  +  Qb  -  Dab 


if  AEAB.g  >0 

(13) 

if  AEAB.g  <  0 

(14) 

For  some  reactions,  for  example,  COs  +  Os  C02,s  or  NOs  +  Ns  -♦  N20s, 
the  one-dimensional  treatment  is  satisfactory  (ref.  3).  In  this  case,  the  recombination 
barrier  is  particularly  simple  (cf.  Eqs.  10  and  13): 
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4Ea_b  = 


Qa  Qb 
Qa  +  Qb 


(15) 


Here,  if  Qa  >  >  Qb,  AEa-b  will  be  close  to  Qb,  the  heat  of  chemisorption  of  the 
weaker  bound  partner. 

In  conclusion,  two  general  points  should  be  stressed.  First,  in  Eqs.  (4)— (15),  A 
and  B  may  be  either  atoms  or  atomic  groups,  so  that  the  same  formalism  is  used  to 
calculate  Q  and  AE*  for  both  diatomic  and  polyatomic  molecules  (at  the 
zero-coverage  extreme).  Second,  the  BOC-MP  analytic  interrelations  make  use  of 
observables  only,  namely,  the  heats  of  atomic  chemisorption  (Qa.Qb)  and  various 
constants-thermodynamic  (Dab),  structural  (n),  and  numerical  (coefficients).  For  these 
reasons,  the  BOC-MP  model  can  treat  a  remarkably  broad  variety  of  admolecules,  and 
comparison  of  the  model  projections  with  experiment  is  particularly  straightforward. 
Consider  now  some  representative  examples. 

BOC-MP  Applications 


Table  1  lists  heats  of  chemisorption  of  various  aumoiecuies  —  diatomic  and 
polyatomic,  mono-  and  dicoordinated,  weakly  and  strongly  bound.  We  see  that 
agreement  with  experiment  is  excellent,  typically  within  1-2  kcal/mole. 

Table  2  summarizes  the  activation  barriers  of  some  well-studied  reactions  of 
dissociation  and  recombination,  in  good  agreement  with  experiment.  In  particular,  for 
the  recombination  reactions  COs  +  Os  -*  C0  2,s  and  NOs  +  Ns  -♦  N20s,  as  Eq.  15 
predicts,  the  activation  barriers  follow  closely  the  values  of  Qqq  and  Qno  > 
respectively,  but  not  Qq  and  Q^  ■ 
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TABLE  1 

Initial  Heats  of  Molecular  Chemisorption  Qab  a 


Surface 

Coord. 

Type 

AB 

Experimental 

Values  of 

Qab 

Qa 

Qb 

DABb 

Calcd. 

Exp. 

Ni(lll) 

V 

CO 

171 

257 

29  c 

27 

Pt(lll) 

'n1 

NO 

116 

150 

26d 

27 

Pd(lll) 

V 

NO 

130 

150 

32d 

31 

Pt(lll) 

T1 

nh3 

116 

279 

13c 

12-15 

Ni(lll) 

n 

nh3 

135 

279 

18c 

20 

Pt(lll) 

V 

oh2 

85 

220 

llc 

12 

Pt(lll) 

V 

o=ch2 

85 

176 

lie 

11 

Pt(lll) 

q2 

o2 

85 

85 

119 

lie 

9 

Ru(001) 

Tl2 

0=C(CH3)2 

100 

67 

179 

15f 

16 

Ni(lll) 

q2 

h2c=ch2 

171 

171 

355 

14e 

13 

Pt(lll) 

q1 

OH 

85 

102 

399 

36-45 

Pt(lll) 

q1 

NH 

116 

102 

689 

63-69 

aSee  Ref.  3  for  sources  of  the  experimental  values  of  Q.  All  energies  in  kcal/mole. 
bRef.  4. 

cEquation  4  for  n  =  1. 

Equation  4  for  n  =  2. 
eEquation  6. 

Equation  5. 

9Equation  7. 


TABLE  2 

Dissociation  and  Recombination  Barriers  AE*  for  Some  Surface  Reactions3 


Experimental 
Values  of 

AE* 

Reaction 

Surface 

DABb 

Qa 

Qb 

Qab 

Calcd. 

Exp. 

H 2,g  -+  Hs  +  Hs 

Fe(lll) 

104 

62 

62 

7 

2c 

0 

Ni(lll) 

63 

63 

7 

l3 

2 

Cu(100) 

56 

56 

5 

7C 

5 

N2,g  ->  Ns  +  Ns 

Fe(110) 

228 

138 

138 

8 

6C 

8 

Fe(100) 

140 

140 

8 

4C 

2 

Fe(lll) 

139 

139 

8 

5C 

0 

COg  —►  +  Og 

Ni(lll) 

257 

171 

115 

27 

6C 

Ni(100) 

171 

130 

30 

0C 

-3 

W(110) 

200 

125 

21 

-6C 

-15 

NOg  -  Ns  +  Os 

Rh(100) 

151 

128 

102 

26 

-23c 

-15 

Pt(lll) 

116 

85 

27 

-14c 

CO 2,g  C0S  +  Os 

Rh(lll) 

127 

32 

102 

]7d 

17 

Re(00l) 

29 

127 

-5d 

<0 

COs  +  Os  -  CO  2ig 

Rh(lll) 

32 

102 

24  d 

27 

Pd(lll) 

34 

87 

24  d 

25 

Pt(lll) 

32 

85 

23d 

25 

Ag(110) 

6.5 

80 

6.0d 

5.3 

NOs  +  Ns  -*  N2Ose 

Rh(lll) 

26 

128 

22  d 

21 

Rh(100) 

25 

131 

21 

21 d 

21 

Pt(lll) 

27 

116 

22d 

20 

3See  Ref.  3  for  sources  of  the  experimental  values  of  Q  and  AE*.  All  energies  in 
kcal/mole. 
bRef.  4. 
cEquation  9. 

Equations  10  or  15,  respectively. 

eFollo\ved  by  nonactivated  decomposition  N20s  -♦  N2,g  +  Os. 

As  an  example  of  a  complex  heterogeneous  catalytic  process  including  several 
competing  pathways,  we  choose  surface  reactions  of  hydrocarbons  C2HX,  which  have 
drawn  a  great  deal  of  interest  because  of  their  fundamental  and  practical  importance. 
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Naturally  for  chemists,  we  are  mostly  interested  in  periodic  (relative)  changes  in 
catalytic  behavior  of  transition  metals.  Specifically,  we  select  the  sequence  from  Pt  to 
Ni  and  further  to  Fe  or  W,  the  latter  two  being  simulated  as  the  same  model  metal 
FeAV  (with  the  model  parameters  averaged  over  the  (close)  parameters  of  real  Fe  and 
W).  The  BOC-MP  calculations  have  been  made  for  the  smoothest  (most  densely 
packed)  surfaces,  namely  for  fee  Pt(lll),  Ni(lll),  and  bcc  Fe/W(110). 

Here  there  are  two  problems  of  special  interest:  (1)  the  thermochemistry  of 
C2Hx  species  in  the  chemisorbed  vs  gas-phase  state,  and  (2)  the  effects  of  metal 
composition  and  the  structure  of  C2HX  species  on  the  activation  energy  for  C-H  and 
C-C  bond  cleavage. 

Table  3  lists  total  bond  energies  in  the  gas  phase  (D)  and  chemisorbed  (D  +  Q) 
states  for  all  C2HX  species  (x  =  0-6).  Some  comparisons  of  enthalpies  in  the  gas 
phase  vs  chemisorbed  states  are  made  in  Table  4.  The  calculated  activation  barriers 
AE*  for  C-C  and  C-H  bond  cleavage  and  recombination  for  chemisorbed  C2HX 
species  are  summarized  in  Table  5.  All  the  discussion  below  will  refer  to  chemisorbed 
species  if  not  stated  otherwise. 

Of  general  model  conclusions,  the  most  important  is  the  following:  many 
reorganizations  of  C2HX  species,  being  highly  endothermic  in  the  gas  phase,  typically 
become  exothermic  on  transition-metal  surfaces.  For  example,  in  the  gas  phase  the 
ground  state  of  C2H3  is  vinyl  H2G=CH  which  is  lower  than  ethyl  idyne  H3C-C  (the 
excited  state)  by  45  kcal/mole.  Under  chemisorption,  this  isomerization  is  highly 
exothermic  (AH  =  -8,  -15,  -25  kcal/mole  for  Pt,  Ni,  Fe/W,  respectively).  Similarly, 
isomerization  of  acetylene  to  vinylidene  HCsCH  -♦  H2C=C  is  highly  endothermic  in 
the  gas  phase  (AH  =  44  kcal/mole)  but  becomes  highly  exothermic  by  13-41  kcal/mole 
on  the  metal  surfaces  studied.  This  makes  it  comprehensible  why  H3CC  and  H2CC  are 
often  observed  in  the  chemisorbed  states  (unlike  the  gas  phase). 
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TABLE  3 

Total  Bond  Energies  in  the  Gas-Phase  (D)  and  Chemisorbed  (D  +  Q)  States 
of  C2HX  on  Pt(lll),  Ni(lll),  and  Fe/W(110)a 


C2Hx 

°C2hx  b 

Qc2hx  c 

°C2Hx  + 

Qc2hx  +  (8  -  x)Qh 

Fe/W 

Ni 

Pt 

Fe/W 

Ni 

Pt 

H3C-CH3 

674 

6 

5 

5 

812 

805 

801 

h3c-ch2 

576 

64 

49 

39 

838 

814 

798 

H3C-CH 

466 

107 

85 

70 

837 

803 

780 

pi2c=ch2 

538 

20 

15 

12 

822 

805 

794 

H3C-C 

376 

141 

115 

97 

847 

806 

778 

h2c=ch 

421 

71 

55 

44 

822 

791 

770 

h2c=c 

348 

110 

87 

71 

854 

813 

785 

HCsCH 

392 

25 

18 

14 

81 3 

788 

772 

HC=C 

259 

106 

84 

69 

827 

784 

755 

ch3  +  ch3 

586 

124 

96 

76 

842 

808 

784 

ch3  +  ch2 

476 

166 

131 

106 

840 

796 

765 

CPI3  +  CH 

374 

204 

164 

135 

842 

790 

753 

CPI3  +  C 

293 

262 

219 

188 

885 

827 

786 

CH2  +  ch2 

366 

208 

166 

136 

838 

784 

746 

CPI2  +  CPI 

264 

246 

199 

165 

840 

778 

734 

ch2  +  C 

183 

304 

254 

218 

883 

815 

767 

CPI  +  CH 

162 

284 

232 

194 

842 

772 

722 

CH  +  C 

81 

342 

287 

247 

885 

809 

755 

C  +  C 

0 

400 

342 

300 

928 

846 

788 

CPI4  +  CPI4 

796 

12 

12 

12 

808 

808 

808 

a  The  parameters  used:  Qc  =  150,  171,  200  and  Qh  =  61,  63,  66  for  Pt,  Ni,  Fe/W, 
respectively  (ref  3).  All  energies  in  kcal/mole. 
b  From  Ref.  4  with  corrections  and  additions  specified  in  ref.  3. 
c  Equations  4,  7  or  8. 

d  Normalized  for  stoichiometry  C2Hs  (C2H6  +  H2  -*  2CH4),  when  for  C2HX  (or  CHy 
+  CHx_y)  the  rest  (8-x)  atoms  H  are  assumed  to  be  atomically  chemisorbed. 


TABLE  4 

Enthalpies  of  C2HX  Isomerization  Reactions:  Gas-Phase  vs  Chemisorbed  States 


API,  kcal/molea 


Reaction  Gas  Phase 

Fe/W  (110) 

Ni(lll) 

Pt(lll) 

c2h4 

H2C=CH2  -  H3C=CH  72 

-15 

2 

14 

c2h3 

H2C=CH  -  H3C-C  45 

-25 

-15 

-8 

c2h2 

HC=CH  -*  H2C=C  44 

-41 

-25 

-13 
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TABLE  5 

*  * 

Activation  Barriers  for  Forward  AEf  and  Reversed  AEr  Reactions  of  Chemisorbed 
C2Hxa 

AEfC  AEr*d 


c2hx 

Reaction 

Dcxb 

Fe/W 

Ni 

Pt 

Fe/W 

Ni 

Pt 

c2h6 

CH3CH3g 

±t  CH3CH2  +  H 

98 

-3 

5 

8 

38 

19 

10 

ch3ch3 

CH3CH2  +  H 

98 

3 

10 

13 

35 

19 

10 

c2h5 

ch3ch2 

±r  CH3  +  CH2 

100 

18 

24 

33 

20 

6 

0 

±r  CH2CH2  +  H 

38 

16 

11 

4 

0 

2 

0 

±t  CH3CH  +  H 

110 

21 

24 

25 

20 

13 

7 

c2h4 

CH2CH2,g 

±r  CH2  +  ch2 

172 

-2 

17 

36 

36 

11 

0 

CH2CH  +  H 

117 

-3 

7 

13 

20 

8 

1 

CH2CH2 

CH2  +  CH2 

172 

18 

32 

48 

34 

11 

0 

CH2CH  +  H 

117 

17 

22 

25 

17 

8 

1 

ch3ch 

±r  CH3  +  CH 

92 

19 

23 

27 

24 

10 

o 

*  CH2CH  +  H 

45 

25 

21 

18 

10 

9 

8 

CH3C  +  H 

90 

17 

19 

20 

27 

22 

18 

c2h3 

ch2ch 

*  CH2  +  CH 

157 

21 

31 

38 

39 

18 

2 

15  CHCH  +  H 

29 

14 

9 

5 

5 

6 

7 

15  CH2C  +  H 

73 

5 

7 

9 

37 

29 

24 

ch3c 

15  CH3  +  C 

83 

5 

8 

11 

43 

29 

19 

15  CH2C  +  H 

28 

2 

15 

13 

9 

22 

20 

c2h2 

CHCHg 

15  CH  +  CH 

230 

-4 

19 

36 

54 

21 

0 

i5  CHC  +  H 

133 

-12 

1 

11 

39 

18 

8 

CHCH 

15  CH  +  CH 

230 

21 

37 

50 

50 

21 

0 

15  CHC  +  H 

133 

13 

19 

25 

27 

18 

8 

ch2c 

15  CH2  +  C 

165 

20 

27 

32 

49 

29 

14 

±5  CHC  +  H 

89 

34 

32 

31 

4 

3 

1 

c2h 

CMC 

15  CH  +  C 

178 

13 

22 

29 

71 

47 

29 

a  The  barriers  for  the  gas-phase  ethane,  ethylene,  and  acetylene  are  also  added.  All 
energies  in  kcal/mole. 

b  The  difference  between  the  gas-phase  total  bond  energies  of  the  reactant  and 
products  (see  Table  3). 

c  Equations  9  or  12.  The  values  of  Q  from  Table  3. 
d  Equations  13  or  14. 


For  ethylene  C2H4  on  Pt(lll)  and  Ni(lll),  our  estimates  for  the  heats  of 
chemisorption  are  Qc2h4  =  12  and  15  kcal/moie,  respectively,  in  excellent  agreement 
with  the  experimental  range  Qc2h4  =  11-13  kcal/mole  for  Pt,  Pd,  Ru,  and  Ni. 
Consistently,  for  the  metal  range  Ni-Pt,  we  find  the  distinct  gas-phase  C-H  bond 
dissociation  barrier  AEcH.g  =  7-13  kcal/mole,  which  explains  why  the  chemisorbed 
molecular  C2H4  requires  elevated  temperatures  for  chemical  transformations.  As  seen 
from  Table  3,  the  formation  of  CH3C  from  C2H4  is  moderately  endothermic  on  Pt, 
practically  thermoneutral  on  Ni,  and  highly  exothermic  on  Fe/W  (AH  =  16,  -1,  -25 
kcal/mole,  respectively).  On  Pt(lll),  we  predict  CH3C  to  be  rather  stable  since  the 
calculated  C-C  bond  scission  barrier  is  AEcc  =  11  kcal/mole.  But  on  Ni(lll)  and 
Fe/W(110),  we  predict  this  barrier  to  be  smaller  —  8  and  5  kcal/mole,  respectively. 

Thus  the  model  conclusion  is  that  the  stability  of  CH3C  decreases  in  the  order  Pt 
>  Ni  >  Fe/W.  Indeed,  ethylidyne  has  been  observed  readily  under  decomposition  of 
ethylene  on  Pt,  Pd,  Rh,  Ru  (ref.  5),  only  at  high  adsorbate  coverages  on  Ni  (ref.  6) 
and,  probably  on  Co  (ref.  7),  but  not  on  more  active  metals.  One  can  add  that  on 
Pt(lll),  the  C-C  bond  cleavage  CH3C  -♦  CH3  +  C  (AEcc  =11  kcal/mole)  seems  to  be 
more  favorable  than  the  C-H  bond  cleavage  CH3C  -*  CH2C  +  H  (AEch  =  1 3 
kcal/mole),  so  that  the  molecule  will  retain  most  of  its  hydrogen  up  to  the  point  of 
C-C  bond  scission,  in  agreement  with  the  13C  NMR  experiment  (ref.  8). 

We  predict  that  acetylene  chemisorbs  slightly  stronger  than  ethylene  with  Qc2H2  = 
14,  18,  25  kcal/mole  for  Pt(lll),  Ni(lll),  Fe/W(110),  respectively.  Experimental  data 
on  Qc2H:  (usually  from  TPD  spectra)  are  not  available  because  C2H2  begins  to 
decompose  before  it  desorbs.  Consistently,  we  found  the  gas-phase  C-H  bond  cleavage 
to  be  nonactivatcd  on  Ni  and  especially  on  Fe/W,  namely  AEcH.g  =  1  and  -12 
kcal/mole,  respectively.  On  Pt,  where  AEcn.g  =  11  kcal/mole,  the  first  surface  reaction 
appears  to  be  the  distinctly  exothermic  isomerization  CHCH  -♦  CH2C  for  which  we 
found  AH  =  -13  kcal/mole.  Indeed,  the  formation  of  the  vinylidene  CH2C  intermediate 
on  Pt(l  1 1)  was  first  suggested  from  EELS  spectra  and  confirmed  on  Pt  particles  by  the 
13C  NMR  analysis  (ref.  9).  However,  this  isomerization  is  not  a  favorable  route  for 
C-C  bond  scission  since  CH2C  CH2  +  C  would  require  AEcc  =  32  kcal/mole.  So, 
we  predict  that  dehydrogenation  CHCH  -♦  CHC  +  H  will  occur  instead  (AEch  =  25 
kcal/mole),  and  then  the  C-C  bond  scission  CHC  -♦  CH  +  C  (AEcc  =  29  kcal/mole). 
Thus,  contrary  to  decomposition  of  CH3C,  one  can  expect  a  substantial  loss  of 
hydrogen  before  the  C-C  bond  rupture,  again  in  agreement  with  the  13C  NMR  data 
(ref.  9). 

On  Fe/W(110),  we  project  the  rather  different  situation  because  for  CHCH  not 
only  the  C-H  but  also  C-C  bond  cleavage  seems  to  be  nonactivated  from  the  gas 
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phase  (AEcc.g  =  -4  kcal/mole).  Thus,  on  Fe  surfaces,  one  can  expect  acetylene  to 
rapidly  decompose  into  CHX  fragments.  This  model  projection  is  consistent  with  the 
fact  that  under  heating  of  chemisorbed  CHCH,  only  CHX  intermediates  have  been 
observed  on  various  Fe  surfaces  (ref.  10).  Between  Fe/W  and  Pt,  the  decomposition 
products  may  be  a  variety  of  C2HX  and  CHX  species  depending  on  metal  composition 
and  reaction  conditions.  For  example,  on  Ni  surfaces,  rapid  decomposition  of  CHCH 
to  (partly)  HCC  and  (mainly)  CHX  species  was  reported  (ref.  11).  At  the  same  time, 
on  Ru(001),  whose  activity  is  intermediate  between  Pt(lll)  and  Ni(lll),  the  whole  set 
of  HXCC  species,  x  =  1,  2,  3,  resulting  from  isomerization  (CHCH  -♦  CH2C), 
dehydrogenation  (CHCH  -*  CHC  +  H)  and  rehydrogenation  (CHCH  + 

H  -  [CH2CH]  -  CH3C),  was  identified  by  EELS  (ref.  12). 

The  model  can  also  shed  light  on  regularities  of  hydrogenolysis  C2H6  +  H2 
2CH4.  In  general,  by  comparing  possible  C-C  bond  scission  routes  on  Pt  vs  Ni  vs 
Fe/W  (cf.  Table  5),  one  can  easily  see  that  the  hydrogen  content  (x)  in  the 
hydrocarbon  species  C2HX  undergoing  this  scission  decreases  in  the  order  Fe/W  >  Ni  > 
Pt,  in  agreement  with  vast  experimental  observations  (ref.  13). 

The  similar  analyses  have  been  successfully  done  for  other  heterogeneous 
catalytic  processes  such  as  CO  hydrogenation  to  methane  CH4  and  methanol  CH3OH, 
and  decomposition  of  formic  acid  HCOOH  (ref.  3). 

CONCLUSION 

The  analytic  BOC-MP  model  effectively  treats  reaction  energetics  on 
transition-metal  surfaces.  This  efficiency  is  of  particular  importance  because  the  model 
is  based  on  a  few  assumptions,  within  which  the  model  interrelations  are  exact  for 
atomic  adsorbates  and  well  defined  for  molecular  adsorbates,  the  same  analytic 
formalism  being  used  to  treat  both  diatomic  and  polyatomic  molecules.  Moreover, 
these  interrelations  are  expressed  in  terms  of  observables  only  (the  heats  of 
chemisorption  and  various  constants),  which  makes  comparison  with  experiment 
typically  direct  and  unambiguous.  The  rigor  and  simplicity  of  this 
back-of-the-envelope  model  makes  it  a  promising  tool  for  exploring  surface  reactivity. 
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DISCUSSION 


DURUP  -  1 .  It  looks  paradoxical  that  the  effect  of  the  environment  on  the  metal  atom 
(crystallographic  surface,  etc.)  directly  enters  your  theory  through  the  atomic 
adsorption  energy,  whereas  the  adsorbed  atom  environment  enters  the  metal  bond 
energy  only  indirectly,  although  it  is  a  major  perturbation. 

2.  Since  you  called  on  Karl  Popper,  could  you  give  an  example  of  a 
clear-cut  prediction  in  a  case  where  the  experiment  has  not  yet  ben  performed  ? 

SHUSTOROVICH  - 

1 . 1  don't  see  any  paradox.  The  environment  of  the  adsorbed  atom  A  in  the 
coordinated  molecule  AB  enters  via  the  gas-phase  dissociation  energy  Dab,  similar  to 
that  of  the  meta;  atom  M  via  the  atomic  chemisorption  energy  QA.  Actually  an  interplay 
of  these  energetic  parameters,  Qa  (and/or  Qb)  and  DAb  determines  the  heat  of 
molecular  chemisorption  Qab  (cf.  eqs.  (4)-(8)]. 

2.  The  Popperian  criterion  or  falsifiability  requires  to  make  model 
predictions  that  can  be  proved  to  be  wrong.  In  a  sense,  all  the  calculated  values  of  QAb 
and  AE‘ab  given  above  are  model  predictions  since  they  are  obtained  from  the  same 
set  of  equations  with  the  fixed  parameters.  As  an  example  of  the  clear-cut  prediction 
awaiting  the  experimental  verdict  one  can  take  the  thermochemistry  of  C2HX 
transformations  (see  Tables  3  and  4). 


1 .  Are  your  results  on  chemisorption  heat  different  from  those  obtained  by 
R.  Hoffmann  for  CO  on  Ni(1 11)  (-1,66  eV)  by  tight  binding  methods  [J.A.C.S.,  1985, 
107,  578-584]  ? 

2.  Can  you  precise  the  principle  of  bond  order  conservation  ? 
SHUSTOROVICH  - 

1.  Since  the  BOC-MP  model  is  different,  the  more  informative  question 
would  be  how  close  the  calculated  values  of  Qco  are  to  the  experimental  value  of  -1.2 
eV.  Our  value  is  1.3  eV  (cf  Table  1). 

2.  If  you  mean  the  justification  of  the  BOC  principle,  it  can  only  refer  you  to 
thorough  studies,  both  computational  (ab  initio)  and  experimental  (X-ray),  of  a  variety 
of  linear  three-center  systems  A-B-C,  where  the  additive  BOC  at  unity  (Xab  +  XBc  =  1) 
has  invariably  been  found  (see  a  list  of  references  in  the  review  (ref.  3)).  The  BOC 
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condition  used  in  our  work  is  an  extension  of  this  well-established  regularity  to 
polycenter  interactions  of  quasi-spherical  character  (where  the  energy  depends  only 
on  the  distance). 


ANGYAN  -  1.  What  was  the  bond  order  definition  of  the  ab-initio  calculations  you 
mentioned  which  justify  the  bond  order  conservation  ? 

2. 1  should  like  to  call  your  attention  to  a  recent  work  using  a  Mulliken-type 
definition  of  the  bond  order,  which  confirms  also  the  bond  order  conservation  ; 
Lendvay,  J.  Phys.  Chem.) 


SHUSIQRQVICH  - 

1.  The  equilibrium  distances  r-|  and  r2  in  Ar,  Br2C  have  been  calculated 
and,  by  comparing  with  the  known  values  of  roi  and  r02  (for  the  gas-phase  diatomics 
Ar01  and  Br02  C),  the  values  of  Xi  =  exp  [-(^  -  r01/a-,]  and  X2  =  exp  [-(r2  -  r02/a2]  have 
been  determined.  In  all  cases,  the  BOC  at  unity,  Xi  +  X2  =  1,  holds  within  ±  0.01 . 

2.  Thank  you. 
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POTENTIAL  MAPS  OF  CH4,  H20  AND  CH3OH  IN  SILiCALITE. 

INFLUENCE  OF  THE  SILICALITE  STRUCTURE. 

F.  VIGNE-MAEDER 

Institut  de  Recherche  sur  la  Catalyse*,  2  avenue  Albert  Einstein,  69626  Villeurbanne 
Cedex,  France 

ABSTRACT 

The  interaction  energy  of  methane,  water  and  methanol  with  silicalite  has  been 
evaluated  by  using  an  atom-atom  potential  containing  atomic  parameters,  charges 
and  dipoles  deduced  from  ab  initio  calculations  of  small  systems  or  fragments. 
Experimental  geometry  has  been  adopted  for  the  silicalite  framework  and  the 
adsorded  molecules  which  have  been  assumed  to  be  rigid.  The  dependence  of  the 
potential  values  on  the  position  of  the  atoms  of  silicalite  has  been  studied  by  drawing 
the  potential  distribution  corresponding  to  three  different  experimental  structures. 

I.  INTRODUCTION 

Zeolites  are  well-crystallised  silicoaluminates  which  consist  of  silica  Si04 
tetrahedra,  linked  to  each  other  by  sharing  all  four  oxygens.  The  isomorphous 
substitution  of  Si4+  by  Al3+  creates  an  excess  negative  charge  which  is  neutralized  by 
metallic  cations  or  protons.  These  materials  present  particular  properties  related  to 
their  microporous  structure  like  large  adsorption  capacity,  molecular  shape  selectivity, 
catalytic  activity  and  they  are  widely  used  in  industry.  From  an  other  point  of  view  they 
are  also  ideal  model  systems  for  fundamental  studies  of  microporous  compounds 
because  experimental  data  on  structure  or  sorption  are  generally  available  and  much 
of  the  methodology  developed  for  biological  systems  should  be  directly  transferable. 

We  have  studied  the  interaction  of  small  molecules  with  silicalite  that  is  pure  silica 
and  corresponds  to  the  end-member  of  the  ZSM-5  series  with  a  ratio  Si/AI 
approaching  infinity.  It  provides  an  example  of  a  microporous  adsorbent  without  any 
adsorption  sites  of  chemical  nature  whose  adsorption  properties  are  only  related  to 
the  channel  structure.  The  framework  of  all  the  ZSM-5  zeolites  is  characterized  by  a 
network  of  straight  channels  interconnected  by  sinusoidal  channels,  both  channels 
consisting  of  10-membered  oxygen  rings. 

The  potential  model  that  we  used  to  calculate  the  interaction  energy  between 
silicalite  and  the  adsorbed  molecules  CH4,  H20  and  CH3OH  is  briefly  described  in 
section  II. 

From  the  interaction  energies,  a  theoretical  value  of  the  initial  heat  of  adsorption 
can  be  estimated  by  average  over  the  orientations  and  positions  of  the  adsorbed 
molecule,  with  the  hypothesis  of  a  Boltzman  distribution.  The  experimental  values  are 

*  Laboratoire  propre  du  CNRS,  conventionne  a  I’Universite  Claude  Bernard 
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satisfactorily  reproduced  but  some  regions  of  deep  potential  have  been  founded 
(ref.1).  This  is  surprising  for  silicalite  that  does  not  contain  any  adsorption  sites  and  for 
which  rather  uniform  potential  distributions  could  be  expected.  As  this  regions  are 
close  to  the  channel  walls,  we  have  examined  the  effect  of  the  position  of  the  atoms  of 
the  silicalite  framework.  We  will  present  in  section  III  some  potential  maps  obtained  by 
using  the  silicalite  structures  given  by  three  different  experimental  works. 

II.  COMPUTATIONAL  PROCEDURE 


As  usually  the  interaction  energy  has  been  expressed  as  a  sum  of  four 
contributions,  the  electrostatic,  polarization,  dispersion  and  repulsion  energy 
respectively.  The  interacting  systems  are  assumed  to  be  rigid  so  that  each  part  can  be 
represented  as  a  sum  of  pairwise  additive  terms 


E=  IIEjj 
1  i 

where  i  and  j  denote  the  atoms  (or  other  particular  points)  of  the  silicalite  and  of  the 
adsorbed  molecule  respectively.  As  concerns  the  numerical  evaluation,  we  have  used 
a  procedure  developed  by  Claverie  and  coworkers  (ref.2).  The  dispersion  and 
repulsion  terms  are  of  Buckingham  type  (6-exp)  and,  to  obtain  the  electrostatic  and 
polarization  parts,  the  abinitio  charge  distribution  of  each  isolated  system  is 
represented  as  a  multicentered  muitipole  expansion. 

The  required  atomic  parameters  for  carbon,  oxygen,  hydrogen  have  been  taken 
from  previous  works.  The  parameters  for  silicon  have  been  calibrated  with  respect  to 
the  SCF  interaction  energy  of  the  system  disiloxane  (H3SiOSiH3)-water  at  the 
equilibrium  distance.  All  the  SCF  calculations  have  been  performed  with 
pseudopotentials  and  minimal  basis  sets. 

In  that  procedure,  the  very  large  silicalite  crystal  should  be  replaced  by  a  molecular 
system  of  reasonable  size.  Clusters  with  diameters  of  24  A  appeared  to  be  an 
acceptable  compromise  between  accuracy  and  computer  time.  They  contain  about 
200  silica  tetrahedra  whose  free  valences  at  the  external  surface  are  saturated  with 
hydrogen  atoms. 

Nevertheless  the  size  of  these  clusters  is  too  large  to  allow  SCF  calculations.  Their 
multipole  representations  have  been  reconstructed  from  those  of  dimers 
(HO)3SiOSi(OH)3  and  monomers  Si(OH)4  by  superposition  and  subtraction  of  parts  in 
excess  (ref.3).  For  example,  a  trimer  (H0)3Si0Si(0H)20Si(0H)3  would  be  obtained 
by  the  scheme : 
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Fig.  1.  Water-silicalite  :  potential  map  in  the  mirror  plane  of  a  sinusoidal  channel 
(y  ^  5  A)  obtained  with  different  crystallographic  data  for  silicalite  :  (A)  ref.  7,  (B)  ref. 
6,  (C)  ref.  5. 

Silicalite  contains  12  different  crystallographic  sites  for  the  silicon  atoms,  so  that  we 
had  to  perform  SCF  calculations  for  12  monomers  and  26  dimers  of  different 
geometries,  which  explains  the  use  of  pseudopotentials.  The  multipole  expansions 
have  been  limited  to  charges  and  dipoles  distributed  on  all  the  atoms.  More  details 
about  the  computational  procedure  are  given  in  ref.1. 

III.  POTENTIAL  MAPS 


The  framework  structure  has  not  been  taken  from  the  work  of  Flanigen  on  silicalite 
itself  (ref.4)  because  some  Si-0  distances  seem  doubtful.  But  we  used  data  on 
isostructural  ZSM-5  zeolites.  We  have  considered  the  results  given  by  three  different 
authors  having  used  single  crystal  X-rays  analysis  and  samples  of  different  A! 
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Fig.  2.  Water  silicalite  :  potential  map  in  a  longitudinal  section  of  a  straight  channel 
(x  sc  9  A)  with  different  geometries  for  silicalite  :  (A)  ref.  7,  (B)  ref.  6,  (C)  ref.  5. 

content :  1,1  Al  /  unit  cell  for  Olson  (ref.5),  8.0  for  Lermer  (ref.6)  and  0.3  for  van 
Koningsveld  (ref.7).  All  samples  contained  tetrapropylammonium  ions,  that  are 
template  ions  required  for  growth  and  stabilization  of  the  crystals,  located  at  the 
channel  intersection. 

The  general  features  of  the  straight  and  sinusoidal  channels  are  given  on  figures  1 
and  2  by  the  equipotentiai  curves  corresponding  to  the  highest  potential  value  (-2 
kcal/mol  for  water).  It  is  worthy  to  note  that  this  curves  do  not  delimit  the  channel  void 
space  but  represent  the  positions  of  the  water  oxygen  atom  when  the  molecule  is  as 
close  as  possible  to  the  channel  wall.  It  is  easy  to  observe  along  this  curves  the 
circular  traces  of  the  oxygen  atoms  constituting  the  channel  wall.  Both  channels  have 
roughly  the  same  diameter  and  are  interrupted  by  larger  volumes  at  the  intersections 
(for  x  =  0,1 0  and  20  on  figure  1  ;  for  y  =  5  and  1 5  on  figure  2). 
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Fig.  3.  Methanol-silicalite  :  potential  map  in  the  mirror  plane  of  a  sinusoidal  channel. 
Silicalite  structures  taken  from  ref.  7  (A)  and  5  (C). 


The  potential  maps  are  plotted  with  the  interaction  energy  values  minimized  with 
respect  to  the  orientation.  All  the  maps  (Figs.1-4)  present  a  domain  of  high  potentials 
at  the  middle  of  the  intersection  (roughly  -  3  kcal/mol  for  water  and  methane,  -7  for 
methanol)  where  the  molecule  is  most  distant  from  the  silicalite  atoms.  On  the  contrary 
in  the  channels  the  potential  is  deeper  since  the  distances  of  the  molecule  with 
several  framework  atoms  are  smaller.  The  potential  distribution  is  nearly  uniform  for 
methane  because  the  main  contribution  is  the  dispersion  term.  It  has  been  shown 
(ref.8)  that  in  large  cavities  the  interaction  energy  of  dispersion  type  is  deeper  near  the 
walls  and  tends  to  become  uniform  when  the  diameter  of  the  cavity  decreases  up  to 
the  diameter  of  the  adsorbed  molecule.  For  water  and  methanol  the  electrostatic 
contribution  that  is  sensible  to  specific  charge  environments  yields  a  potential 
distribution  in  the  channels  more  irregular  than  for  methane.  The  maps  for  methanol 
looks  like  the  addition  of  the  maps  of  methane  (the  values  of  the  dispersion  part  are 
very  similar)  and  of  water  (similar  values  of  the  electrostatic  part). 

In  table  1  are  reported  approximate  values  of  the  initial  heat  of  adsorption  obtained 
by  average  of  the  potentials  over  the  orientation  and  the  position  of  the  adsorbed 
molecule,  a  Boltzman  distribution  being  assumed.  Our  results  are  qualitatively  in  good 
accordance  with  the  experimental  ones.  In  particular  the  increasing  order  of  CH4  to 
CH3OH  is  well  reproduced.  Our  better  values,  corresponding  to  the  structure  of  van 
Koningsveld  (A),  differ  from  the  experimental  ones  by  about  3  kcal/mol. 


Fig.  4.  Methane-silicalite  :  potential  map  in  the  mirror  plane  of  a  sinusoidal  channel. 
Silicalite  structures  taken  from  ref.  6  (B)  and  5  (C). 
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TABLE  1 

Adsorption  heat  Qtheo  estimated  from  the  potential  values  obtained  with  the  silicalite 
structures  of  ref.  7  (A),  ref.  6  (B)  and  ref.  5  (C).  The  experimental  data  Qexp  are  average 
over  several  litterature  values. 

The  distribution  of  the  potentials  appear  to  be  less  satisfactory.  Namely  the 
potential  maps  show  small  regions  of  very  deep  potential  that  should  correspond  to 
strong  adsorption  sites.  This  is  not  corroborated  by  the  experimental  works  in  which 
the  adsorption  sites  of  the  H-ZSM-5  zeolites  are  identified  with  the  hydrogen  atoms 
and  should  not  be  obtained  in  our  model  of  silicalite.  Potential  holes  are  obtained  with 
all  the  three  silicalite  structures,  but  they  have  different  deepths  and  positions.  For 
example  for  water  (Figs.1 ,  2)  the  deepest  potentials  are  iocated  at  the  channel 
intersection  (-19.5  kcal/mol)  for  Olson’s  structure  (C)  and  in  the  sinusoidal  channel  for 
both  other  structures  (B)  and  (A)  (-13.5  kcal/mol  and  -12.4  respectively).  For  methanol 


(Fig.3)  their  positions  are  similar,  but  for  methane  the  potential  holes  are  more 
numerous  and  less  deep  (Fig.4).  By  examining  the  different  contributions  to  the 
interaction  energy  it  appears  that  all  this  large  negative  values  of  the  potential 
correspond  to  a  large  electrostatic  term. 

We  have  more  closely  compared  the  potentials  of  water  obtained  with  the  structures 

(B)  and  (C)  between  which  differences  up  to  4  kcal/mol  have  been  obtained  for  the 
same  positions  and  orientations  of  the  water  molecule.  Combining  with  the  structure 

(C)  the  charges  and  dipoles  calculated  with  the  structure  (B)  was  not  sufficient  to 
reproduce  the  potentials  fully  obtained  with  (B).  But  they  was  satisfactorily  reproduced 
by  taking  the  structure  (B)  and  the  charges  and  dipoles  corresponding  to  (C),  so  that 
the  geometry  adopted  in  the  interaction  energy  calculations  appears  to  be  the 
determining  factor.  More  precisely  it  is  the  position  of  only  some  framework  atoms, 
close  to  the  adsorbed  molecule,  that  is  very  important.  The  potentials  are  sensitive  to 
very  small  shifts  in  the  atomic  positions  (not  larger  than  0.01  A)  because  of  the  rather 
large  charges  carried  by  the  silicon  and  oxygen  atoms  (about  2.7  and  1.3  a.u. 
respectively). 

Thus  the  influence  of  the  framework  geometry  on  the  potential  distribution  could  be 
the  result  of  the  overestimation  of  the  electrostatic  part  of  the  interaction  energy. 
Further  work  will  be  needed  to  improve  the  quality  of  the  multipole  representation  of 
silicalite,  i.e.  the  quality  of  the  SCF  calculations  on  the  fragments,  for  example  by  using 
larger  basis  sets  without  pseudopotentials  and  by  taking  into  account  the  electric  field 
created  by  the  part  of  the  crystal  surrounding  the  considered  fragment  [ref.  9]. 
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DISCUSSION 


ANGYAN  -  1 .  What  are  the  charges  of  Si  and  0  atoms  in  your  model  ? 

2.  The  calculation  of  electrostatic  interaction  energy  based  on  clusters  of 
24  A  edges  may  be  misleading.  In  effect,  these  clusters  are  a  little  bit  larger  than  one 
unit  cell,  which  is  in  general  not  sufficient  to  have  converged  multiple  lattice  sums. 
These  lattice  sums  are  only  conditionally  convergent  in  the  direct  space.  Standard 
techniques  are  available  to  make  these  lattice  sums  converge  (Ewald,  Bertaut)  which 
should  be  used,  in  my  opinion  in  the  present  case. 

VIGNE-MAEDER  - 

1.  Mean  values  :  2.7  for  Si,  1.3  for  0,  0.5  for  H.  These  values  depend 
slightly  on  the  crystallographic  site  {variations  lower  than  5%). 

2.  This  techniques  should  be  actually  used,  at  least  in  order  to  estimate 
the  accuracy  of  our  results.  But  they  are  very  time  consuming  and  not  easy  to 
implement.  The  small  regions  of  deep  potential  that  seem  doubtful  in  our  potential 
maps  originate  probably  in  local  interactions  and  not  in  the  truncation  ot  the  crystal. 


MONQUE  -  Did  you  ever  work  or  have  you  intended  to  model  silicalite  type  zeolite  or 
ZSM-5  with  aluminium  atoms  included  in  the  network  ?  Because  as  you  probably 
know  they  are  very  used  in  catalysis. 

VIGNE-MAEDER  -  We  will  probably  consider  such  zeolites  in  the  future.  They  are  more 
difficult  to  model  because  they  contain  four  types  of  atoms  (Si,  0,  Al  and  cation)  and 
the  position  of  the  aluminium  atoms  are  not  known  experimentally.  Their  acidic 
properties  have  been  up  to  now  generally  studied  on  small  fragments  of  2  to  5  Si  or  Al 
atoms  and  it  will  be  indeed  very  important  to  introduce  the  influence  of  the  framework. 
For  catalysis,  the  modelling  of  silicalite  is  not  uninteresting  because  it  is  well  suitable 
for  studying  diffusion  which  represents  an  important  part  of  the  catalytic  process. 


143 


l-  Modelling  of  Molecular  Structures  and  Properties.  Proceedings  of  an  International  Meeting, 
I  Nancy,  France,  11-15  September  1989,  J.-L.  Rivail  (Ed. ) 

f  Studies  in  Physical  and  Theoretical  Chemistry,  Volume  71,  pages  143-164 

L  ©  1990  Elsevier  Science  Publishers  B.V.,  Amsterdam  —  Printed  in  The  Netherlands 


it 

t 
f  s 


k 


1 


t 

i 

i 

i 

t 

4 

9 

i 

f 

V 

3 

t 


i 


( 


i 


f 


Conformational  studies  on  macrocylic  receptors  and 
on  their  substrate  complexes. 


Georges  Wipff 

Institut  de  Chimie,  1 ,  rue  Blaise  Pascal,  67000  Strasbourg  (France) 


SUMMARY 

Several  computer  modelling  techniques  are  used  to  provide  deeper  insights  into 
the  recognition  properties  of  synthetic  macrocyclic  receptors,  and  to  adress  the 
questions  of  preorganisation,  complementarity,  and  binding  selectivity.  Alternatively, 
references  to  experimental  data  allows  to  outline  present  theoretical  and 
computational  limitations. 

INTRODUCTION 

Macrocyclic  receptors  synthesized  and  studied  extensively  since  the  pioneering 
works  of  C.  Pedersen  [1],  J.-M.  Lehn  [2]  and  D.J.  Cram  [3]  represent  a  fascinating  field 
of  research  at  the  frontiers  of  chemistry,  physics  and  biology.  Among  them,  crown 
ethers,  cryptands,  spherands,  podands  and  derivatives  are  able  to  act  as  receptors 
presenting  a  concave  molecular  region  which  bind  strongly  with  marked  specificities  to 
small  species  such  as  cations,  anions,  neutral  molecules  [4-6]. 

We  have  been  interested  in  modelling  such  receptors  and  their  inclusion 
complexes  because  of  their  remarkable  recognition  properties,  similar  to  those 
observed  in  large  and  complex  biological  systems.  Recognition  may  extend  beyond 
simple  non  covalent  associations  to  processes  which  follow  complexation,  such  as 
transport,  or  catalysis.  It  appears  that  in  contrast  to  large  biological  species,  much 
information  is  available  concerning  their  constitution  and  structure(s)  in  the  solid  state, 
their  spectroscopic  properties  in  various  solutions  and  environments,  and 
thermodynamic  and  kinetic  properties.  Their  relatively  small  size  allows  us  to  perform 
computer  experiments  in  reasonable  time.  Finally,  it  appears  that  basic  and  practical 
questions  raised  in  these  modelling  studies  are  in  essence  similar  to  those 
encountered  with  biological  species  and  processes.  We  therefore  felt  it  of  importance 
to  test  and  develop  current  modelling  techniques,  combining  applied  theoretical  tools 
and  high  performance  computer  graphics.  In  the  following,  we  illustrate  uses  of 
molecular  mechanics  (MM),  molecular  dynamics  (MD),  Monte  Carlo  (MC)  simulations. 
Computer  graphics  will  not  be  discussed  as  such,  although  being  essential  at  both 
sides  of  the  simulations,  in  providing  input  for  programs,  as  well  as  for  the  visual 
analysis  of  the  results. 
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CONFORMATIONS  AND  RECOGNITION 

Given  the  recognition  capabilities  of  these  receptors,  their  structures  and 
conformations  are  analyzed  focusing  on  their  ability  to  form  "lock  and  key"  complexes 
with  potential  substrates.  Translation  of  this  phenomenological  scheme  [7-10]  into 
structural  features  implies  that  the  conformation  of  the  macrocycle  is  such  that  a  cavity, 
or  at  least  a  concave  region  is  present  with  a  suitable  orientation  of  the  putative 
binding  sites.  Whether  such  a  cavity  is  an  intrinsic  feature  of  the  free  receptor  (in  the 
"gas  phase"),  or  if  it  is  induced  by  the  substrate,  by  a  given  solvent  or  environment  are 
questions  that  are  adressed  using  molecular  modelling  techniques.  For  instance, 
X-ray  structures  of  18-crown-6  (18-6)  or  of  the  bicyclic  222  cryptand  in  their  free  state 
and  in  the  absence  of  molecular  environment  are  such  that  they  have  no  preformed 
cavity  suitable  for  complexation  [11,  12].  In  such  examples  one  would  like  to  estimate 
the  energy  needed  to  organize  cavities. 

Classical  conformational  analysis  describes  primary  processes  such  as  inversion  at 
atomic  centers  or  rotation  around  bonds  in  molecules  taking  into  account  steric  and 
electrostatic  intramolecular  interactions.  For  host/guest  or  receptor/substrate  (R/S) 
complexes  a  "supermolecule"  type  approach  involving  the  internal  energy  of  B  and  £, 
their  mutual  interaction  and  the  effect  of  the  environment  (e.g.  counter  ions,  solvent 
molecules,  etc..)  on  both  fi  and  £  should  be  used. 

The  basic  computational  procedures  require  an  adequate  sampling  of  the 
conformational  space  of  B,  of  BZ£  isolated  or  in  their  environment,  and  to  calculate 
rapidly  the  energy  of  each  set  of  coordinates.  We  will  see  examples  and  limitations  of 
MM  calculations  on  typical  macrocyles  in  vacuo:  monocyclic  18-6  [13],  bicyclic  222 
[14]  and  tricyclic  SC24  [15]  in  vacuum  (Fig.  1).  To  characterize  in  short  the  conformers, 
we  refer  either  to  their  symmetry  (e.g.  18-6  _D3d),  or  to  the  structure  of  the  complex 
from  which  it  has  been  extracted.  For  instance,  SC24_N  and  222JC  are  conformers 
of  SC24  and  222  cryptands,  extracted  from  the  NH4+  and  K+  cryptates.  respectively. 


»-voM Cm1 

Uvv 


m-n-p=0  :  1 1 1 

18-6  h*-1-  n-P-°  =  211 

m-n-1,  p-0  :  221 
m-n-D-1  :  222 

Fig.  1 : 18-crown-6,  the  bicyclic  and  tricyclic  cryptands. 
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Dynamic  views  of  these  flexible  species  and  of  the  "lock  and  key"  complementarity 
emerge  from  the  analysis  of  their  low  frequency  normal  modes  of  vibration  and  from 
MD  simulations  [16].  Statistical  analysis  of  structures  found  by  high  temperature 
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annealed  MD  simulations  provides  alternative  pictures  of  the  flexibility  and 
conformational  preferences  [17].  Finally,  modelling  reaction  paths  for  ion  inclusion  into 
compact  cages  illustrates  the  importance  of  flexibility/rigidity  of  their  cage,  and  of  the 
size  of  the  ion  on  intrinsic  barriers  [18]. 

Significant  theoretical  progress  has  been  made  recently  in  the  understanding  and 
prediction  of  binding  selectivities  in  solution.  The  method  is  based  on  the  calculation  of 
the  change  in  free  energy  when  one  system  is  mutated  into  another  via  small 
perturbations  [19].  The  sampling  of  the  conformational  space  at  each  step  may  be 
obtained  by  MC  or  MD  simulations.  We  used  an  extended  approach  called  the 
thermodynamic  cycle  perturbation  method  and  determined  relative  binding 
energies.This  procedure  was  tested  to  account  quantitatively  for  the  Cl'/Br  binding 
selectivity  of  SC24.4H+  [20].  It  has  been  used  recently  in  the  macrocyclic  area  to 
calculate  the  relative  binding  free  energies  of  Na+/K+  by  18-6  in  water  [21]  and  in 
methanol  [22],  of  nitromethane/  malononitrile/  acetonitrile  by  18-6  [23],  or  of  Na+/K+  by 
dibenzo-crown  ethers  [24]  as  well  as  the  binding  of  pyridine/pyrazine  by  one  of 
Rebek’s  acridine  diacid  receptors  in  chloroform  [25].  More  generally,  this  promising 
technique  should  lead  to  relative  free  energies  of  solvation  for  closely  related  species 
or  conformers,  or  of  binding  involving  biological  receptors  [19].  The  studies  reported  so 
far  deal  however  with  systems  whose  structure  arid  binding  properties  are  known 
experimentally.  Although  being  one  of  the  ultimate  goals  of  molecular  modelling, 
quantitative  prediction  of  binding  selectivities  is  still  a  formidable  task,  mainly  because 
of  conformational  sampling  and  energy  representation  problems. 

ENERGY  REPRESENTATION  AND  COMPUTATIONAL  TECHNIQUES 

Molecular  mechanics,  and  dynamics  calculations  were  performed  with  the  program 
AMBER  [26],  using  the  representation  of  the  potential  energy  given  in  reference  27. 

The  bonds  and  bond  angles  are  treated  as  harmonic  springs,  and  a  torsional  term 
is  associated  to  the  dihedral  angles.  The  interactions  between  atoms  separated  by  at 
least  three  bonds  are  described  within  a  pairwise  additive  scheme  by  a  1-6-12 
potential. 

The  parameters  are  derived  from  the  AMBER  force  field  [27]  and  can  be  found  in 
references  13-16.  Most  critical  is  the  electrostatic  representation  of  the  system,  which 
was  as  much  as  possible  calibrated  from  gas  phase  data.  The  lack  of  an  explicit 
polarization  term  in  this  force  field  led  us  to  use  different  sets  of  charges  depending  of 
the  presence  and  nature  of  host  ion.  For  the  uncomplexed  macrocycles,  the  charges 
(qo=-0.3,  qN=-0.24)  account  for  the  dipole  moment  of  the  OMe2,  NMe3  building  units. 
For  the  alkali  cation  complexes,  larger  values  (qo=-0.6,qN=-0.6)  were  required  to 
reproduce  experimental  M+...OMe2  interaction  enthalpies  in  the  gas  phase.  A  similar 
procedure  was  used  to  derive  charges  involving  ammonium  sites  or  the  NH4+ 
substrate.  Explicit  account  of  distance  and  orientation  dependent  polarization  energy 
and  of  many  body  effects  would  be  more  satisfactory  [28],  but  this  is  not  incorporated  in 


146 


current  modelling  software  performing  MC  simulations  with  reliable  parameters,  and 
requires  extensive  computer  time. 

MM  optimisations  relax  the  starting  structure  to  the  nearest  energy  minimum  and 
have  to  be  performed  consistent  in  terms  of  force  field  and  procedure.  We  used 
conjugate  gradient  minimization  followed  in  most  cases  by  a  Newton  Raphson 
optimization  to  check  for  the  second  derivatives  of  the  energy,  and  to  obtain  the  normal 
modes  of  vibration. 

MD  simulations  were  run  for  100  ps  at  300K  starting  with  random  velocities  and 
using  the  Verlet  algorithm  with  a  time  step  of  1  fs  [26]. 

BUILDING  SELECTED  3D  STRUCTURES 

Finding  computationnally  the  3D  cartesian  coordinates  of  macrocycles  is  a  very 
challenging  problem  because  of  their  (poly)cyclic  nature  and  of  the  way  the  structural 
requirements  are  formulated.  For  instance,  one  would  make  the  complex  of  18-6  with 
ammonium  derivatives  (Fig.  2a)  such  that  the  crown  anchors  the  NH3+  moiety,  that  the 
lateral  substituents  are  in  axial  position,  and  that  the  ester  carbonyl  of  the  peptidic 
substrate  may  be  attacked  by  one  sulfur  of  the  receptor. 


Fig.  2a:  (left:)  "supermolecule"  formed  between  a  lateral  derivative  of  18-6  and  an 
ammonium  substrate  (from  reference  5).  Fig.  2b  :  (right):  the  "Barrel, 6H+"  receptor 
(R  =  CH2-CH2-CH2)  of  NO3'  (from  reference  67). 

Classical  methods  based  on  incremental  building  in  terms  of  internal  coordinates 
cannot  be  used  because  dihedral  angles  are  unknown.  Manipulations  of  3D  structures 
done  easily  in  the  hands  of  the  chemist  with  CPK  models  [29],  involving  concerted 
rotations  around  several  bonds  in  such  a  way  that  a  cavity  forms,  keeping  the 
connectivity,  and  such  that  the  binding  sites  are  properly  oriented  cannot  be  done  with 
current  modelling  software,  especially  for  polycyclic  systems.  On  the  other  hand, 
building  methods  such  as  symmetry  repetition  of  given  fragments  [30]  or  drawing 
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|  structures  within  a  diamond  lattice  [31]  are  specific  and  cannot  be  used  for  general 
f',  purposes. 

I-  The  Crippen’s  approach  [32]  based  on  the  partially  known  matrix  of  interatomic 

L  distances  revealed  to  be  successful  in  translating  constraints  such  as  range  of 
distances  between  non  bonded  centers  or  of  dihedral  angles  into  distances.  From  the 
|  upper  and  lower  bounds  of  that  matrix,  several  sets  of  coordinates  could  be  calculated 

?  [33].  The  2D3D  graphics  interface  allowed  us  to  input  graphically  the  data  from  2D 

f  drawings,  and  to  visualize  and  analyze  the  3D  structures  generated  [34,35].  For 
t  instance,  "in-out"  forms  of  222  (Fig.  3)  were  generated  by  imposing  Cc-N-LP  angles  of 

|  ©<3 

'  Fig.  3  :  "out-out",  "out-in"  and  "in-in"  topomers  of  the  bicyclic  222  cryptand. 


0°  and  180°,  respectively  (Cc  is  the  Center  of  the  cavity,  and  LP  is  a  point  along  the 
nitrogen  lone  pair)  [14].  Similary,  coordinates  of  various  "inTout"  topomers  of  SC24 
could  be  obtained  [15].  One  major  limitation  of  this  approach  is  the  small  number  of 
different  structures  produced  and  the  bias  introduced  by  the  constraints.  In  addition,  in 
order  to  find  solutions  a  balance  between  short  range  and  long  range  constraints  is 
required. 

The  ELLIPSE  technique  of  Billeter  et  al.  [36,  37]  like  DISMAN  [38]  uses  the  torsional 
angles  as  variables  and  was  able  to  build  selected  forms  of  highly  connected  and 
constrained  species,  such  as  the  "Barrel",  6H+/N03*  inclusive  complex 
(Fig.  2b)  [39],  Here,  in  addition  to  the  ring  closure  requirements,  the  constraint  was  to 
form  H-bonds  involving  the  twelve  N-H+  and  NO3'.  However,  such  purely  geometrical 
techniques  as  ELLIPSE  or  distance  geometry  may  not  lead  to  low  energy  forms,  nor  to 
the  absolute  minimum.  They  have  therefore  to  be  combined  with  energy  optimizations 
and  structure  relaxations  [40].  P.A.  Kollman  reported  such  a  combined  use  of  ELLIPSE 
and  MD  simulations  for  18-6  [37],  Houk  et  al.  modelled  phenanthrene  macrocycles 
using  ELLIPSE  and  AMBER  or  MM2  optimisations  [41].  We  performed  a 
conformational  analysis  of  18-6,  222,  SC24  neutral  and  protonated  combining 
distance  geometry  searches  with  AMBER  [13-15]. 

Monte  Carlo  searches  taking  ihe  dihedral  angles  as  variables  [42]  can  hardiy  be 
used  for  polycyclic  systems,  especially  when  additional  structural  criteria  have  to  be 
met.  The  Saunders  stochastic  method  [43]  combines  random  atomic  displacements 
with  energy  refinements,  sampling  largely  the  conformational  space  of  (poly)cyclic 
v  hydrocarbons  and  "hunting"  for  the  global  minimum.  The  latter  method  might  be  used 
>»  ,  for  macrocyclic  receptors  as  well.  Annealing  procedures  used  in  conjunction  with 
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Monte  Carlo  simulations  [44]  have  been  reported  for  open  or  monocyclic  molecules, 
but  have  not  been  tested  yet  on  polycyclic  systems. 

Finally,  quenched  or  annealed  MD  simulations  at  high  temperature  incorporating 
constraints  should  also  provide  suitable  structures,  as  used  for  modelling  structures 
from  NMR  data  [45]. 

MOLECULAR  MECHANICS  OPTIMIZATIONS  "IN  VACUO"  ENERGY  COMPARISON 
OF  X-RAY  STRUCTURES. 

Structures  in  the  crystalline  state  [11,12]  may  differ  from  those  in  solution,  but  are 
precious  as  a  proof  of  the  inclusive  nature  of  the  complexes,  and  of  their 
conformational  flexibility.  For  instance,  18-6  is  either  of  Cj,  D3d  or  Ci  symmetry  when 
respectively  uncomplexed  and  in  the  absence  of  molecular  environment,  complexing 
K+  or  NH4+  or  complexing  Na+.  The  222  cryptand  has  an  "in-in"  orientation  of  the 
nitrogen  bridgeheads  in  all  structures,  except  in  the  bis-BH3  adduct  which  is  "out-out", 
and  will  be  noted  hereafter  222_00.  The  free  222  (noted  222JI)  is  quite  elongated, 
whereas  its  Na+,  K+,  NH4+  Cs+,  TI+  cryptates  have  a  similar  D3  form  with  converging 
N,0  binding  sites.  With  Ag+  or  Ca2+  as  hosts,  222  is  distorted  with  a  smaller  cavity.  No 
"in-out"  form  has  been  found  in  crystals,  although  probably  present  in  protic  solvents 
[2],  For  SC24,  the  two  X-ray  structures  available  NH4+/SC24  and  CI’/SC24,4H+ 
are  "in-in-in-in"  with  similar  cavity  sizes,  but  correspond  to  different  conformations  [1 1]. 
Since  the  structure  of  SC24  free  could  not  be  solved  it  is  not  clear  from  experiment 
whether  SC24  has  a  preformed  cavity,  or  if  the  cavity  is  induced  upon  complexation. 
However  modelling  studies  indicate  that  various  "in"/"out"  topomers  of  SC24 
uncomplexed  have  a  quasi  tetrahedral  cavity  of  fairly  constant  size  [15]. 

Energy  comparison  of  these  forms,  although  being  in  principle  a  straightforward 
procedure  suffers  limitations  related  to  the  force  field  and  to  the  optimisation 
procedures.  Comparison  of  MM  and  of  ab  initio  calculations  on  18-6  in  its  Cj,  D3d  and 
Ci  conformations  illustrate  the  first  point  [13,  37,  46].  There  is  agreement  on  the  fact 
that  the  distorted  Ci  form  is  the  least  stable,  but  not  on  the  relative  Cj/D3d  stabilities. 
The  discrepency  is  mostly  related  to  the  electrostatic  effects:  the  repulsive 
1-4  OC-CO  interactions  in  the  gauche  arrangement  disfavor  D3d  relative  to  Cj. 
Similarly,  relative  conformational  energies  of  222  obtained  by  AMBER  [14]  and  MM2 
[47]  differ.  It  is  therefore  important  to  run  several  sets  of  calculations  varying  critical 
parameters  such  as  the  dielectric  constant  or  the  atomic  charges  in  order  to  assess  the 
origin  and  range  of  conformational  preferences. 

Relative  stabilities  and  optimized  structures  depend  also  on  the  minimization 
procedure.  For  instance,  using  the  conjugate  gradients  algorithm  gave  increased 
stabilities  for  the  Ca,  Na,  and  K  conformers  of  the  222  cryptand,  in  qualitative 
agreement  with  the  expected  strain  induced  in  the  Ca2+,  Na+  and  K+  cryptates, 
respectively.  However,  Newton-Raphson  optimizations  converged  to  a  same  minimum 
of  D3  symmetry  [14], 
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in  some  optimizations,  we  wanted  to  keep  the  "optimized"  structure  as  close  as 
possible  to  the  experimental  one  in  order  to  prevent  similar  structures  converging  to 
the  same  minimum,  and  to  compare  the  deformation  energies  induced  by 
compiexation,  We  found  with  the  222  cryptand  that  no  conclusive  results  could  be 
obtained  by  constraining  the  coordinates  to  their  initial  values  because  the  results 
depend  on  the  choice  of  the  restraining  force  constant.  On  the  other  hand, 
experimental  structures  have  to  be  relaxed  because  of  inhomogeneities  in  bond 
lenghts  dues  to  crystal  disorder,  or  to  thermal  effects  in  the  crystal  (for  instance,  the  C-0 
bonds  of  18-6  are  on  the  average  1.36  A  at  300K,  and  1 .42  A  at  100K  [48]). 

Finally,  concerning  this  search  for  energy  minima  it  is  worth  mentionning  that 
whereas  pieces  of  macrocycles  extracted  from  experimental  structures  are  close  to 
local  energy  minima,  modelbuilt  ones  are  not  and  need  more  extensive  relaxation, 
combining  for  instance  MM  and  MD  simulations. 

MM  optimisations  of  222  and  18-6  uncomplexed  confirmed  that  their  free  forms 
found  in  the  solid  state  (respectively  222_ll  and  Cj)  are  more  stable  than  those 
extracted  from  the  complexes,  which  indicates  that  these  structures  do  not  result  from 
significant  packing  forces.  However,  the  energy  difference  is  weak  (of  a  few  Kcal/mole) 
compared  to  the  interaction  energy  with  an  ion  (respectively  about  95  and  115 
Kcal/mole,  for  the  18-6  and  222/K+  complexes).  In  other  words,  compiexation 
induces  a  structural  reorganization,  but  with  no  significant  strain,  especially  for  the  best 
"recognized"  cation  in  solution  (K+  for  18-6  and  222).  Unlike  crown  ethers  and 
cryptands,  anisole  spherands  [49]  display  little  conformational  flexibility,  and  become 
preorganized  for  compiexation  during  their  chemical  synthesis  [3,  6]. 

Optimisations  of  the  alkali  cation  complexes  of  18-6  in  various  conformations  were 
able  to  account  for  their  structures:  Na+  prefers  the  Ci  form,  K+  the  D3d  form  in  a 
nested  position,  and  Cs+  a  perched  position  over  the  D3d  ring,  as  found  in  the  crystal 
[13].  Similary  optimisations  of  the  222/K+  cryptates  starting  from  222_00  or  222JI 
forms  converged  to  the  K  form  because  of  the  electrostatic  and  steric  strain  induced  by 
K+.  With  222  however,  no  K+/Na+  discrimination  could  be  found  among  the  K/Na/Ag 
conformers  [14]. 

In  these  cation  complexes,  we  calculate  a  decreasing  compiexation  energy  from 
Na+  to  Cs+,  as  observed  in  the  gas  phase.for  binding  of  these  ions  to  ether  or  amine 
binding  sites.  In  aqueous  solution,  however,  the  selectivity  peaks  experimentally  at  K+ 
for  18-6  and  222,  at  Rb+  for  SC24,  and  at  Na+  for  the  anisole  spherand  [2,  5,  6], 
Similar  peaks  are  calculated  when  one  substracts  the  experimental  hydration 
enthalpies  of  these  ions  from  the  calculated  compiexation  energies.  However,  such  a 
naive  procedure  is  unable  to  account  for  the  binding  selectivity  of  one  given  ion  to 
different  receptors  [16,34]  and  could  not  be  used  reasonably  for  prediction  purposes. 

Fragments  of  experimental  structures  were  used  to  model  ammonium  complexes  of 
18-6  and  of  its  lateral  amide  derivatives  [49].  MM  optimisations  account  for  the  binding 
selectivity  (primary>  secondary>  tertiary  ammonium)  observed  in  the  gas  phase  and  in 
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solution,  as  well  as  for  the  different  modes  cf  binding.  However,  when  the  structural 
complexity  of  the  system  increases,  e.g.  with  longer  ammonium  substrates  or  lateral 
arms,  no  conclusion  could  be  drawn  because  of  the  multiple  minima  problem  [49]. 
Particularly,  the  ability  of  that  system  to  display  chiral  recognition  could  not  be 
determined. 

Chiral  recognition  of  ammonium  substrates  was  studied  with  the  chiral  Cram's 
receptor  based  on  binaphtyl  units  (Fig.  4)  and  its  L/D  complexes  with 
(Phenyl)-+Gly(OMe)  [49],  The  complex  with  the  L  substrate  was  built  from  the  X-ray 
structure  of  the  macrocycle  with  the  D  substrate.  Here,  the  MM  optimisations  of  the 
complexes  account  for  the  weak  energy  difference  (2  Kcal/mole)  involved  in  the  chiral 
recognition  displayed  by  this  macrocycle. 


Fig.  4  :  The  Cram's  chiral  macrocycla. 


HARMONIC  DYNAMICS.  LOW  FREQUENCY  NORMAL  MODES  OF  VIBRATION 

The  low  frequency  normal  modes  of  vibration  give  insight  into  molecular 
deformations  of  low  energy  (less  than  100  cm'1)  and  of  quite  large  amplitude.  Atomic 
displacements  of  about  3  A  can  be  achieved  at  less  than  5  Kcal/mole.  The  display  of 
these  vibrations  on  the  PS300  graphics  system  using  MDNM,  [50]  either  with  static 
vectors  of  elongation,  or  as  dynamic  pictures,  revealed  interesting  qualitative  features 
which  may  be  related  to  the  recognition  properties  of  these  macrocyclic  receptors  [16]. 
The  type  of  motions:  qualitative  features. 

First,  for  the  free  receptors,  from  the  monocyclic  18-6  to  the  tricyclic  SC24,  among 
the  three  first  vibrational  modes,  a  particular  one  is  found  which  leads  to  an  opening  of 
the  cavity,  and  makes  its  binding  sites  more  accessible  to  the  solvent,  or  to  an 
approaching  substrate  (Fig.  5)  In  other  words,  there  are  normal  modes  of  vibration  of 
the  free  receptor  which  provide  a  path  for  substrate  inclusion.  This  feature  is  is  not 
restricted  to  the  conformers  extracted  from  cation  complexes.  For  instance  for  18-6,  V7 
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Fig.  5  :  Typical  low  frequency  normal  modes  of  vibration  (orthogonal  views)  of  18-6 
free  (line  1)  in  its  D3d,  C|  and  Ci  forms,  of  18-6  Na+,  K+,  Cs+  complexes  (line  2),  of 
222_K,  222JI,  222_00  free  (line  3),  of  222/M+  (respectively  K+,  Na+,  K+,  Cs+) 
cryptates  (line  4). 


and  Vo  of  the  D3d  form,  as  well  as  V7  and  V10  of  the  Cj  form,  or  V7  and  Vg  of  the  Ci 
form  correspond  to  a  same  type  of  ring  folding.  For  the  bicyclic  222  cryptand.Vs  and  Vg 
of  the  222_K  form,  V7  and  Vs  of  the  222JI  form,  or  V7  of  the  222_00  form  are  such 
that  two  bridges  open/close  and  make  the  cavity  less  shielded.  In  SC24  neutral,  or 
tetraprotonated  SC24,4H+  in  the  "in-in-in-in"  form,  one  face  of  the  tetrahedron  formed 
by  the  nitrogens  opens  while  the  opposed  nitrogen  approaches,  making  a  kind  of 
breathing  motion  of  C3  symmetry. 

In  a  barrel  type  tricyclic  topology,  the  "Barrel, 6H+"  (Fig.  2b)  receptor  of  NC>3'  [67] 
displays  opening  motions  of  the  ammonium  chains  among  the  first  vibrations.  Such 
molecular  motions  are  probably  a  general  feature  of  receptors,  which  have  been 
characterized  in  larger  biological  systems  like  lysozyme,  trypsin  [51]. 

In  the  complexed  macrocycles,  similar  vibrations  are  found,  which  provide  paths  for 
cation  extrusion  and  solvation.  See  for  instance  v7  for  the  18-6/K+  complex.  For  the 
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222/Na+,  K+  and  Cs+  cryptates,  Vq  is  the  characteristic  mode  (Fig.  5).  Similarly,  V8  of 
the  SC24/K+  or  SC24/NH4+  cryptates  is  a  "decomplexation  mode"  [34], 

Vibrations  of  the  ion  inside  the  cages  reflects  also  the  ion/cage  complementarity. 
Indeed,  when  the  cation  fits  into  the  cage,  it  is  immobile  in  the  lowest  frequency  modes; 
otherwise  its  moves  in  a  characteristic  way.  For  instance  in  V7  of  the  18-6_D3d 
complex,  Na+,  which  is  slightly  too  small  undergoes  in  plane  librations  of  large 
amplitude;  Cs+  which  is  too  large  oscillates  along  the  C3  axis  above  the  ring,  whereas 
K+  which  has  the  right  size  does  not  move  (Fig.  5).  In  the  222  cryptates,  the  first  motion 
for  Na+  is  a  large  oscillation  along  the  N— N  axis  inside  the  cage  (in  Vi 2,  73  cm'1), 
whereas  K+  is  expelled  through  one  face  at  higher  frequency  (V15, 134  cm"1)  and  Cs+ 
has  no  significant  motion. 

Vibrational  frequencies. 

Quantitatively,  we  found  these  low  frequencies  to  be  rather  insensitive  to  the 
electrostatic  representation  of  the  molecule.  The  vibrational  spectrum  becomes  more 
rich  in  low  frequencies  from  18-6,  to  222  and  SC24.  The  reverse  order  might  have 
been  anticipated  based  on  the  intuitive  idea  that  increased  connectivity  and 
rigidification  would  increase  the  vibrational  frequencies,  but  the  results  simply  follow 
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Fig.  6  :  Sum  of  the  10  lowest  frequencies  of  vibration  (in  cm"1)  for  18-S_D3d,  222_K, 
SC24_N  free  and  in  their  Na+,  K+,  Rb+,  Cs+  complexes. 

Upon  cation  complexation,  the  receptor  should  be  rigidified  by  the  electrostatic  and 
steric  strains  induced  by  the  cation.  Fig.  6  displays  the  sum  (S10)  of  the,  for  example, 
ten  lowest  frequencies  for  the  free  and  complexed  macrocycles  In  all  the  complexes  of 
18-6,  222  and  SC24,  S10  is  higher  for  Cs+  than  for  Na+.  There  are  however  two 
interesting  observations  related  to  the  cation/crown  complementarity.  First,  S10  is 
smaller  for  Na+  complexes, than  for  the  free  macrocycles,  probably  because  Na+  is  too 
small  and  undergoes  large  amplitude  librations  within  these  cages.  Second,  S10 
depends  on  the  size  and  flexibility  of  the  cage, going  from  Na+  to  Cs+.  SC24  is  rather 
rigid  with  fixed  dimensions  [15],  and  the  lowest  frequencies  increase  regularly.  A 
similar  trend  was  observed  for  the  Li+,  Na+  and  K+  complexes  of  a  rigid  "preorganized" 
anisole  spherand  [49].  222  is  more  flexible  and  adjusts  itself  around  the  cations:  going 
from  K+  to  Cs+  a  plateau  is  observed.  18-6  makes  with  Na+  and  K+  inclusion  type 
complexes,  but  Cs+,  too  big,  sits  over  the  ring  making  a  weaker  complex,  and  the  low 
frequencies  decrease  from  K+  to  Cs+.  To  determine  to  which  extent  these  differences  in 
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dynamic  behaviors  are  specific  to  these  macrocycles  needs  further  investigations. 
Nevertheless  we  feel  that  they  express  in  some  way  differences  in  the 
receptor/substrate  complementarity  and  in  flexibility. 

MOLECULAR  DYNAMICS  SIMULATIONS  AT  300K 

Such  simulations  provide  alternative  insights  into  conformational  stabilities  of  the 
free  and  complexed  receptors,  into  the  flexibility  as  a  function  of  the  conformation  of  the 
macrocycle  and  of  the  nature  of  the  complexed  ion,  i.e.  into  the  BZ&  complementarity 
[16].  The  Root  Mean  Square  atomic  displacements  (RMS)  from  the  average  position 
during  the  MD  simulation  were  calculated  to  size  the  amplitude  of  the  motions. 

First,  in  contrast  to  what  we  observed  for  the  normal  modes  of  vibration,  there  is  a 
clear  effect  of  the  electrostatic  representations  on  these  RMS's.  Particularly  the  free 
macrocycles  simulated  with  charges  of  low  polarity  (qo  =-0.3)  retain  their  starting 
X-ray  conformation,  but  undergo  conformational  changes  with  large  fluctuations  when 
polar  charges  are  used  (qo  =-0.6).  This  is  because  gauche  OC-CO  arrangements 
become  destabilized  relative  to  trans  by  1-4  repulsive  0-0  interactions. 

The  uncomplexed  macrocycles. 

Comparison  of  the  RMS's  for  uncomplexed  18-6,  222,  SC24  in  several 
conformations  and  for  111,  211,  221  to  222  in  the  bicyclic  series  demonstrate  the 
effect  of  the  conformation  on  the  molecular  mobility.  Again,  considering  the  connectivity 
criteria  alone,  one  might  have  expected  to  find  decreased  mobility  and  RMS’s  from  18- 
6  to  SC24.  In  the  bicyclic  series,  rigidification  might  similarly  be  expected  going  from 
222  to  111.  The  calculated  RMS's  averaged  by  atom  groups,  or  over  the  whole 
molecules  show  that  it  is  not  so.  Among  the  conformers  with  the  lowest  RMS's,  one 
finds  the  order  bicyclic  222 _ll  (0.32  A)  <  monocyclic  18-6_D3d  (0.34  A)  <  tricyclic 
SC24_N  (0.47  A)  I  The  largest  RMS’s  are  found  for  the  three  types  of  topologies,  in 
the  order  SC24_CI  (0.53  A)  <  222_K  (0.54  A)  <  18-6_Ci  (0.56  A).  In  the  bicyclic 
series,  the  111  and  222  cryptands  fluctuate  respectively  by  0.23  and  0.32  A.  As 
expected,  the  "larger"  molecule  fluctuates  more  than  the"smaller”  one.  Nevertheless, as 
an  effect  of  topology,  size  and  conformation  221_K  shows  a  singularly  larger 
fluctuation  (0.56  A) 

How  can  relative  mobilities  be  rationalized?  They  might  seem  first  to  depend  on  the 
relative  conformational  stabilities.  For  18-6  and  222,  the  most  stable  conformers 
(respectively  C|,  close  to  D3d  and  222JI)  have  indeed  the  lowest  fluctuations. 
Conversely,  the  Ci  form  of  18-6  which  has  the  largest  RMS  is  highest  in  energy  and 
SC24_CI,  less  stable  than  SC24_N  fluctuates  more  (respectively  of  0.53  and  0.47 
A).  There  are  however  exceptions  like  222_K,  which  is  more  mobile  than  222_00 
(respectively  of  0.52  and  0.44  A),  although  being  more  stable  by  5.4  kcal/mole. 
Similarly,  among  the  three  forms  of  221  extracted  respectively  from  its  Co2+,  Na+  and 
K+  cryptates,  221_Co  is  the  least  stable  compared  to  221_Na  or221_K  (respec¬ 
tively  4.4,  0.5  and  0.0  kcal/mole).  But,  the  order  of  mobilities  is  reversed  (respectively 
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0.33,  0.47  and  0.56  A).  Here,  the  order  follows  the  size  of  the  cavity  left  by  Co2+,  Na+ 
and  K+  (respectively  2.22,  2.65  and  2.91  A  for  the  cation—N  distances,  and  2.16,  2.28 
2.80  A  for  the  cation--0  distances  [11]).  This  relation  seems  to  be  an  interesting  clue. 
The  cavity  size  effect,  according  to  which  conformers  with  largest  empty  cavities 
display  the  largest  fluctuations  may  also  rationalize  why  elongated  222_ll  is 
significantly  less  mobile  than  222_K  extracted  from  the  K+  cryptate.  The  cavity  of 
SC24  is  larger  than  that  of  222  and  its  fluctuations  are  larger,  although  SC24  is  more 
connected.  Concerning  its  two  forms,  SC24_CI  extracted  from  the  SC24,4H+/CI‘ 
cryptate  is  more  mobile  than  SC24_N  extracted  from  the  SC24/NH4+  cryptate, 
although  both  have  similar  cavity  sizes.  This  may  be  presumably  related  to  the  loss  of 
the  organizing  effect  of  the  four  protons  and  of  the  anion  in  the  former.  In  more  general 
terms,  it  appears  thus  that  the  conformer  of  the  receptor  induced  by  substrate  binding 
has  a  large  mobility  in  the  absence  of  the  substrate  and  that  free  macrocycles  tend  to 
fill  their  cavities  [41].  It  is  stressed  that  in  that  respect,  the  solvent  will  play  a  particular 
role. 

The  cation  complexes. 

In  the  cation  complexes  of  18-6,  222  and  SC24  calculated  with  polar  set  of 
charges,  one  observes  first  a  rigidification  compared  to  the  free  receptor.  For  the  K+ 
cryptates  of  18-6,  222_K  and  SC24„N,  the  RMS's  drop  respectively  by  0.21,  0.40 
and  0.12  A.  There  is  in  addition  a  clear  effect  of  cation/receptor  complementarity  on  the 
relative  motions  of  the  cation  and  of  the  cage.  Indeed,  from  Na+  to  Cs+,  the  fluctuations 
of  the  cage  decreases  in  the  three  macrocycles:  from  0.33  to  0.28  A  in  18-6,  from  0.40 
to  0.23  A  in  222  and  from  0.33  to  0.29  A  in  SC24,  as  a  result  of  the  increased  size  of 
the  ion.  Although  Na+  interacts  more  strongly  with  the  cages  than  K+  or  Cs+,  it  does  not 
rigidity  them  as  much. 

It  is  interesting  to  compare  the  relative  mobility  of  the  cations  within  the  222  and  the 
SC24  cryptates.  In  general,  it  decreases  as  the  size  of  the  ion  increases  but  Cs+  which 
is  somewhat  more  compressed  in  222  than  in  SC24,  fluctuates  slightly  less 
(respectively  0.14  and  0.16  A).  A  dramatic  difference  is  observed  for  the  smallest  ion. 
Na+  fluctuates  in  SC24  (0.55  A)  more  than  twice  as  in  222  (0.25  A)!  We  believe  that 
this  is  a  result  of  the  difference  in  flexibilities  of  these  cages:  222  is  flexible  not  only  in 
its  ability  to  adopt  various  conformations,  but  also  in  being  able  in  a  given  conformation 
to  adjust  its  size  to  that  of  the  substrate.  On  the  other  hand,  the  more  connected  SC24 
is  more  rigid  and  Keeps  a  similar  cavity  size  in  its  free  state  and  in  its  inclusion 
complexes,  but  that  cavity  is  too  large  for  Na+.  In  this  context,  it  is  also  noteworthy  that 
in  the  optimized  Na+/222  cryptate,  Na+  is  at  the  center  and  equidistant  from  the 
binding  sites  whereas  in  the  Na+/SC24  cryptate  Na+  moves  from  the  center  to  make 
closer  contacts  with  three  oxygens  of  the  cage  [15].  The  flexibility  of  222  reaches 
however  a  limit  in  the  Ca2+  cryptate,  in  which  the  Ca2+-C  distances  are  longer  than 
the  sum  of  van  der  Waals  radii  [11], 
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The  alkali  cations  complexes  of  18-6_D3d,  in  contrast  with  those  of  222  and 
SC24,  display  an  interesting  non  regular  behaviour  from  Na+  to  Cs+.  This  is  because 
Na+  and  K+  can  fit  into  the  ring,  whereas  Cs+  too  big  for  18-6  sits  and  fluctuates  over 
the  ring.  As  a  result,  Cs+  is  less  anchored  than  K+  and  fluctuates  more,  respectively  by 
0.28  and  0.24  A.  For  18-6,  the  RMS's  peak  at  Rb+,  whose  size  allows  fluctuations 
between  "inclusive"  and  "perched"  type  positions. 

MD  simulations  on  ammonium  complexes  of  18-6  free  and  substituted  by  lateral 
amide  arms  [52],  modelled  as  anchoring  site  in  an  "artificial  enzyme"  shed  light  on  the 
possible  importance  of  the  mobility  for  efficient  catalysis  by  such  systems.  Indeed,  we 
calculate  that  amidic  fragments  increase  both  the  stability  and  the  mobility  of  R-NH3+ 
complexes  compared  to  18-6  unsubstituted.  It  might  have  been  anticipated  that 
increased  stability  induces  reduced  mobility.  Above,  for  the  alkali  cation  complexes  of 
222  and  SC24,  we  have  seen  that  it  was  not  so.  In  the  ammonium  complexes  of 
lateral  amide  derivatives  of  18-6,  the  electric  field  of  the  amide  carbonyls  facilitates 
indeed  both  the  binding  of  the  substrate  and  its  oscillations  away  from  the  crown  ether 
oxygens.  Catalytic  behavior  requires  the  supermolecule  to  be  flexible  enough  to  move 
from  stable  non  covalent  "Michaelis  complexes"  to  stabilized  transition  states  involving 
covalent  binding  between  reactive  centers  of  the  receptor  and  of  the  substrate  [52,55]. 
Our  MD  results  confirm  that  such  an  effect  operates  in  "artificial  enzymes  based  on 
macrocyclic  units  in  which  enhanced  stability  is  accompanied  by  increased  mobility 
[52].  Biological  catalytic  systems  might  take  advantage  of  that  dynamic  effect  as  well. 

SAMPLING  OF  THE  CONFORMATIONAL  SPACE  USING  HIGH  TEMPERATURE 
ANNEALED  MD  SIMULATIONS. 

In  the  MD  simulations  at  300K,  run  for  100  ps,  no  significant  conformational  change 
took  place  for  the  18-6,222  and  SC24  complexes.  For  the  free  18-6_C|, 
unsymmetrical  forms  appeared  in  400  ps,  none  of  them  being  more  stable  than  18- 
6_C|.  We  paid  a  particular  attention  in  the  bicyclic  series  to  the  222  cryptand  and  to 
the  occurence  of  nitrogen  inversion  at  the  bridgeheads.  In  fact,  starting  from  the 
222_00  form,  no  inversion  occured  within  500  ps  using  several  N-C-N  force 
constants,  whereas  such  "out"  to  "in"  conversion  takes  place  rapidly  in  solution  [2].  It 
involves  concerted  reorganization  of  the  bridges,  which  does  not  take  place  either  in 
these  MD  simulations.  Whether  this  results  from  the  too  short  simulation  time,  or  from 
the  lack  of  solvent  molecules  in  these  calculations  has  to  be  assessed  by  appropriate 
computer  experiments. 

We  tested  with  the  222  cryptand  an  alternative  procedure  to  interconvert  structures 
more  efficiently,  and  sample  the  conformational  space,  i.e.  High  Temperature 
Annealed  MD  simulations  [17,  56].  First  the  molecule  is  severely  shaken  by  MD  at  high 
temperature  (1000K)  for  100  ps,  then  each  of  the  500  structures  saved  every  0.2  ps  is 
energy  minimized  by  MM,  reshaken  at  300K  during  20  ps  of  MD  in  order  to  allow  for 
enough  relaxation,  and  finally  reoptimized. 
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Starting  from  several  structures,  including  222_00,  it  is  was  gratifying  to  find  in  the 
final  collection  of  500  annealed  structures  the  experimental  222JI  one  which  was 
calculated  as  being  the  most  stable  among  X-ray  conformers,  as  well  as  new  "in-in" 
slightly  more  stable  ones.  Thus,  this  procedure  is  able  to  produce  "the"  global  minimum 
and  confirms  that  several  forms  similar  in  shape  (quite  elongated  and  without  cavity) 
are  in  equilibrium. 

The  disappointing  result  is  the  absence  of  any  form  of  cryptate  in  that  set,  and 
particularly  of  the  K  conformer  which  displays  the  highest  complementarity  for  K+,  the 
alkali  cation  which  is  the  best  recognized  by  222.  Several  simulations  were  repeated 
including  a  gradual  representation  of  the  complexed  cation,  either  as  a  pure 
electrostatic  driver  (+1  dimensionless  charge)  or  with  an  electrostatic  +  steric  represen¬ 
tation  which  simulates  K+.  We  found  that  the  K  form  was  generated  only  in  the 
presence  of  K+,  although  being  only  a  few  kcal/mole  higher  in  energy  than  222JI. 
This  shows  that  particular  forms  of  222  acting  as  receptors  and  ionophores,  like  those 
which  bind  Ca2+,  Ag+,  or  Pb2+  have  a  low  probability  to  be  present  in  the  absence  of 
their  substrate,  and  that  222  is  topologically,  but  not  conformationally  preorganized  for 
cation  complexation.  More  generally,  in  terms  of  drug  design,  this  suggests  that 
conformations  of  a  flexible  drug  able  to  bind  to  a  receptor  have  very  low  probability  to 
be  found  computationally  if  the  supermolecule  formed  with  the  receptor  is  not 
considered  explicitely. 

During  the  MD  run  at  1000K  several  "inTout"  conversions  take  place,  and  the  final 
set  of  annealed  structures  contains  populations  of  "in-in",  "in-out"  and  "out-out” 
topomers.  The  energy  distribution  of  these  classes  shows  a  clear  preference  for 
converging  orientations  of  the  brigheads.  The  populations  increase  in  the  order  "out- 
out"  <"in-out"  <  "in-in"  (respectively  7%,  33%  and  61%)  and  the  peaks  of  these  classes 
(respectively  at  10.1,  8.7  and  6.2  Kcal/mole  from  222JI),  as  well  as  the  lowest  energy 
forms  (respectively  at  4.4,  3.4,  -0.8  kcal/mole  from  222JI)  confirm  the  preference  for 
"in"  forms. 

New  minima  of  each  topology  are  also  found.  Two  "in-in"  forms  are  slightly  lower 
than  222JI,  three  "out-out”  forms  are  more  stable  than  222_00  extracted  from  a 
molecular  adduct  [11],  and  there  are  several  "in-out"  forms  at  less  than  4  kcal/mole 
from  222_II.The  most  stable  "in-in",  "in-out"  and  "out-out”  forms  are  shown  in  Fig.  7~ 
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Fig.  7  :  Lowest  energy  conformers  of  222  ""in-in",  "in-out”  and  "out-out"  generated  by 
the  High  Temperature  Annealed  MD  simulations. 
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interestingly,  most  of  low  energy  forms  have  non  converging  orientations  of  the 
oxygen  binding  sites.  Several  have  indeed  a  large  "solvent  accessible  surface"  [57], 
comparable  to  that  of  a  thia  analogue  of  222  whose  heteroatoms  point  outside  [58]. 
Such  conformers,  not  characterized  in  the  solid  state  for  222  itself,  should  be 
significantly  better  hydrated  than  the  hydrophobic  ones  extracted  from  cation  inclusion 
complexes,  and  play  a  particular  role  for  ion  capture,  as  well  as  for  the  ionophoric 
behavior.  It  is  also  worth  noting  that  the  energy  difference  between  stable  "in"  and  "out" 
forms  is  small  enough  to  be  compensated  by  hydrogen  bonds  involving  a  protic 
solvent,  and  therefore  these  forms  are  expected  to  be  in  equilibrium  in  solution. 

REACTION  PATHS  FOR  INCLUSION  OF  THE  SUBSTRATE  INTO  THE  RECEPTOR 

Conformational  analysis  of  macrocyclic  supermolecules  should  not  only  consider 
the  extreme  states  {free  versus  complexed  macrocycle),  but  also  the  process  of  ion 
capture.  The  energy  and  structure  of  transition  states  for  ion  inclusion  as  a  function  of 
the  substrate  is  of  particular  interest.  Experimental  data  for  complexation  of  alkali 
cations  by  cryptands  in  solution  show  that  complexation  is  fast  compared  to- 
decomplexation,  and  driven  by  the  stability  of  the  inclusion  complex  formed  [59]. 

Modelling  the  complexation  process  in  solution  is  a  difficult  task  due  to  the 
conformational  changes  of  the  receptor,  and  to  desolvation/solvation  effects.  Recently, 
Pohorille  et  al.  modelled  the  capture  of  Cl*  by  SC24.4H+  in  water  using  MD 
simulations  and  found  that  desolvation  of  CT  and  deformation  of  the  receptor 
contribute  mostly  to  the  barrier  [60]. 

Reaction  path  and  kinetics  for  complexation  of  Na+  by  18-6  in  vacuo  has  been 
calculated  using  canonical  variational  transition-state  theory  [61]. 

We  modelled' the  inclusion  of  alkali  and  NH4+  cations  into  222  and  SC24,  and  the 
inclusion  of  halides  into  SC24.4H+  in  the  gas  phase  [15,  34],  using  step  by  step  MM 
optimisations  and  MD  simulations.  Such  calculations  although  being  crude  shed  light 
on  intrinsic  features  influencing  the  barrier:  size  and  charge  of  the  ion,  and  flexibility  of 
the  cage.  Indeed,  no  barrier  is  calculated  for  the  inclusion  of  Na+  to  Rb+  or  of  NH4+  into 
222  and  SC24.  The  Cs+  case  appears  to  the  cricital.  Cs+  enters  into  SC24  with  a 
5  Kcal/mole  barrier,  but  with  no  barrier  into  222,  despite  the  fact  that  the  cavity  of 
SC24  is  somewhat  larger.  This  "gas  phase"  kinetic  effect  results  mostly  from  the 
rigidity  of  SC24  compared  to  222:  the  passage  of  Cs+  through  a  face  of  SC24 
induces  more  strain  and  deformation  energy 

For  the  Cl’  and  Br'  anions  captured  by  SC24.4H+,  calculated  energy  barriers  of 
7.6  and  12.5  Kcal/mole  separate  external  and  inclusive  complexes  "in  vacuo". 
Whereas  for  Cl',  the  inclusive  form  is  preferred  over  the  exclusive  one  by  6.9  kcal/mole, 
for  Br,  both  forms  have  similar  stabilities  due  to  the  strain  induced  by  Br  in  the 
cryptate.  The  nature  of  the  complexes  in  solution  is  not  clear,  because  they  have  on  the 
NMR  time  scale  an  average  tetrahedral  symmetry  [62].  However,  given  the  relative 
stability  constants  for  CT  and  Br  complexes  (the  Log  Ks  values  being  respectively  >  4 
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and  <  1  in  water)  and  the  results  of  the  above  simulations,  it  may  be  reasonable  to 
conclude  that  Cl*  sits  inside  the  cage  whereas  Br  is  in  rapid  exchange  over  the 
different  faces  of  the  tetrahedron. 

More  generally,  understanding  and  prediction  of  the  nature  of  the  complexes  as  a 
function  the  partners  and  environment  is  a  very  challenging  computational  problem. 
Whereas  X-ray  data  confirm  that  the  Cs+  complexes  of  222  are  of  inclusion  type,  NMR 
data  in  solution  show  that,  depending  on  the  solvent  and  on  the  counterion,  the 
complexes  may  be  either  "inclusive"  or  "exclusive"  [66].  A  more  dramatic  case  is  the 
"Barrel, 6H+'7N03*  complex  where  NMR  data  in  aqueous  solution  provide 
convincing  evidence  for  formation  of  an  inclusive  cryptate  of  1 :1  stoechiometry.but 
where  X-ray  analysis  of  the  crystal  shows  that  the  NO3*  anions  have  external 
coordination  with  the  ammonium  sites  of  the  cage  [67]!  We  are  presently  investigating 
these  problems  using  MD  simulations. 

CONCLUSION 

From  the  few  applications  seen  above,  it  appears  that  modelling  the  conformations 
and  binding  properties  of  macro(poly)cyclic  molecules  brings  new  or  deeper  insights 
into  the  recognition  properties  of  these  species.  The  chemists  who  designed  molecular 
receptors  [1-3]  spent  much  time  working  with  CPK  models  [29]  and  its  might  be  hoped 
that  computer  graphics  systems  replace  or  assist  that  activity.  Cram  quoted  "From  the 
beginning,  we  used  Corey-Pauling-Koltun  molecular  models,  which  served  as  a 
compass  on  an  otherwise  uncharted  sea  of  synthesizable  target  complexes.  We  have 
spent  hundreds  of  hours  building  CPK  models  of  potential  complexes  and  grading 
them  for  derisability  as  research  targets."  [6]  .Such  a  modelling  activity  is  presently  far 
from  being  a  standard  computerisable  procedure,  first  because  of  technical  difficulties 
such  as  synchronous  bond  rotations  keeping  the  (poly)cycles  closed,  and  second 
because  the  inspired  way  the  structure  is  made  and  manipulated  is  hardly  codable. 
From  different  techniques  used  to  built  selected  conformers  of  "supermolecules"  with 
constraints  and  used  to  sample  their  conformational  space  in  vacuo,  dynamic  views  of 
the  free  receptors  and  of  the  concept  of  lock  and  key  complementarity  emerge. 
Attempts  are  made  to  evaluate  conformational  and  complexation  energies,  but  care 
must  be  taken  about  the  significance  of  these  numbers  because  of  the  theoretical 
approximations  used  to  simulate  the  "gas  phase"  situation.  In  addition  the  relevance  of 
gas  phase  data  for  solution  properties  can  be  questioned.  Conformational  analysis  in 
solution  should  benefit  from  recent  theoretical  tools  and  increased  computer  powers. 
The  potential  of  mean  force  for  "effective"  rotation  barriers  or  intermolecular 
interactions  in  solution  will  be  calculated  on  prototype  fragments  and  incorporated  in 
larger  systems. 

The  prediction  of  structures  in  solution  can  be  in  principle  achieved  through  MD 
simulations  taking  into  account  explicitely  the  solvent.  However,  from  computer 
experiments  in  the  gas  phase,  it  is  stressed  that  simulation  times  of  a  few  hundreds  of 
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ps  will  be  too  short  to  allow  for  significant  conformational  interconversions  and  that 
considerable  computer  means  will  be  required.  A  simpler  task  is  the  analysis  of 
solvation  pattern  for  given  conformers  of  the  solute,  a  problem  of  particular  interest 
given  the  ionophoric  properties  of  many  synthetic  macrocyles,  analogous  the  those  of 
natural  antibiotics  [11].  In  solid  state  structures,  solvent  molecules  may  be  strongly 
coordinated  to  the  macrocyle  or  to  its  complexed  ion  [11,  12]  and  have  to  be 
considered  as  part  of  the  "supermolecule”.  Modelling  solvation  pattern  can  be  in 
principle  achieved  through  MC  or  MD  techniques  [59,  55].  We  performed  Monte  Carlo 
simulations  on  18-6  hydrated  in  its  D3d,  Cj  and  Ci  conformations  [57].  From  these 
simulations,  it  was  suggested  that  dissolution  of  the  crystalline  Cj  form  would  lead  to  a 
conformational  change,  and  that  the  D3<j  conformer  might  be  populated  in  solution.  In 
addition,  the  water  structure  around  18-6  was  predicted  for  the  three  conformers  of  the 
crown,  and  confirmed  later  by  a  X-ray  study  of  a  I8-6/PO4H3/6H2O  complex  [58]. 
Thus,  even  if  the  solvation  involves  dynamics  aspects,  it  is  clear  that  some  of  the 
solvent  molecules  are  so  firmly  bonded  that  they  are  part  of  the  structure.  We  are 
presently  simulating  solvation  patterns  of  other  prototype  macrocyclic  receptors. 
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DISCUSSION 


GUILLOT  -  As  far  as  small  systems  are  considered,  the  evaluation  of  absolute  free 
energies  by  various  methods  (perturbation  methods,  test  particle  method)  is  currently 
the  subject  of  a  debate.  It  seems  now  established  that  for  neutral  systems,  these 
methods  converge  provided  that  the  calculation  is  pushed  far  enough.  On  the  contrary, 
for  ionic  solutions  the  perturbation  approach  present  serious  difficulties  related  to  the 
irreversibility  of  the  path  when  charging  progressively  a  molecule.  In  this  context,  do 
you  think  that  your  values  of  AAg  are  meaningful,  even  if  one  can  hope  a  cancellation 
of  errors  when  the  starting  and  the  final  states  are  not  too  different  ? 

WIPFF  -  This  question  refers  to  the  calculation  of  relative  binding  affinities  in  solution 
for  closely  related  substrates,  bound  to  a  same  receptor  (see  references  19  to  25).  I 
think  indeed  that  our  values  for  Cl*  versus  Br  binding  to  SC24.4H4  are  meaningful, 
since  they  are  nearly  identical  to  the  experimental  values  (reference  20).  There  seems 
to  be  no  particular  problem  if  the  mutatioi.  is  performed  keeping  a  constant  charge,  as 
we  did.  Because  long-range  interactions  are  calculated  in  practice  with  a  cut-off 
distance,  mutations  with  variable  charges  might  lead  to  erroneous  relative  energies, 
but  we  kept  a  same  charge  along  the  mutations  in  water.  Concerning  the  question  of 
reversibility,  we  performed  the  mutation  in  the  two  directions  ("forwards"  and 
"backwards”)  to  ensure  that  there  is  no  hysteresis.  Errors  related  to  the  energy 
representation,  and  particularly  to  the  lack  of  polarization  terms  in  the  force  field  seem 
to  cancel,  since  the  agreement  between  relative  energies  calculated  without 
polarization  and  the  experimental  values  (e.g.  relative  solvation  energies)  is  quite 
good  (see  references  19  to  25  and  those  cited  therein). 


BUCKINGHAM  -  Your  potential  model  includes  coulombic  interactions  and  hydrogen 
bonds.  Since  the  hydrogen  bond  may  be  considered  to  be  electrostatic  ,  how  do  you 
define  a  hydrogen  bond  in  your  calculations  ? 

WIPFF  -  Within  the  AMBER  force  field  used,  hydrogen  bonding  may  be  simply 
mimicked  by  electrostatic  interactions  (using  a  1-6-12  potential)  with  appropriate 
choice  of  charges,  or  by  using  a  1-10-12  potential  using  slightly  less  polar  charges. 
Both  may  be  calibrated  on  H-bonded  systems  to  reproduce  gas  phase  enthalpies  of 
complexation.  However,  as  far  as  me  dynamic  behaviour  is  concerned,  the  distance 
dependence  of  the  interactions  leads  to  differences  in  dynamic  properties  with  these 
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two  representations.  In  the  calculations  reported  here,  we  used  a  1-6-12  potential  with 
no  particular  choice  of  H-bond  partners.  Alternative  calculations  using  the  1-10-12 
representations  for  specific  H-bonded  atom  pairs  have  also  been  tested  (references  15 
and  18).  More  specifically,  these  pairs  involved  N...+HN,  0...+HN,  N..,HN,  0.„HN, 
H...OH  contacts. 


DEVILLERS  -  Among  the  500  conformations  you  found  by  molecular  dynamics  on  your 
system  and  after  minimizations,  did  you  find  some  you  were  able  to  consider  as 
identical  ? 

WIPFF  -  We  looked  at  this  question  only  for  experimentally  known  conformers.  Indeed, 
when  the  High  Temperature  Annealed  Simulations  were  performed  on  the  free  222 
cryptand,  the  experimental  form  li  appeared  twice.  In  the  simulations  on  the  222/H+ 
cryptate,  the  Ag  structure  appeared  3  times,  and  in  the  simulations  on  the  222/K+ 
cryptate,  the  K  conformer  appeared  43  times. 


PEPE  -  The  crystal  conformation  of  the  cryptand  is  not  representative  of  the  solution 
conformations,  may  be  this  non  correspondence  is  related  to  the  charge  distribution 
used  in  the  coulombic  energy  calculation,  as  these  molecule  bear  several 
heieroatoms. 

WIPFF  -  The  fact  that  the  structures  found  experimentally  in  crystals  of  the  222 
cryptates  are  not  representative  of  the  conformation  observed  in  solution  for  the  222 
cryptand  is  first  due  to  complexation  and  environment  effects  :  the  conformations  of 
cryptates  (i.e.  inclusion  complexes)  are  induced  by  the  complexed  cation,  and  have 
converging  orientations  of  the  N,0  binding  sites.  Second,  because  of  dynamic  effects 
in  solution,  the  symmetrical  structure  observed  on  the  average  on  the  NMR  time  scale 
results  from  an  equilibrium  between  non  symmetrical  forms.  Our  simulations  on  the 
free  222  show  that  structures  with  cavities  have  weak  probabilities  to  be  present  in 
vacuo.  Instead  structures  with  no  cavities,  or  with  non-converging  binding  sites  are 
found  for  free  222  in  vacuo  and  there  are  good  reasons  to  predict  that  they  should  be 
well  hydrated  and  populated  in  solution. 

As  far  as  the  electrostatic  representation  of  the  system  is  concerned,  it  is  clear  that 
cation  complexation  polarizes  the  C-0  and  N-0  bonds,  and  that  the  dynamic  behavior 
depends  on  the  charges  used  in  the  calculations  (see  reference  16).  Our  set  of 
charges  used  for  free  222  describes  reasonably  the  gas  phase  behaviour  (see  text). 
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Whether  soivation  would  formally  increase  these  charges  (via  polarization  effects),  or 
decrease  them  (via  the  dielectric  constant)  is  not  clear 


DYMEK  -  When  you  monitored  d2  and  d-i  as  a  function  of  time  for  one  of  your 
receptors,  what  was  the  rationale  behind  choosing  those  particular  parameters  ?  Also 
what  kind  of  information  can  you  get  (or  what  conclusion  can  be  drawn)  from  the 
degree  of  correlation  between  the  d2  and  di  trajectories  ? 

WIPFF  -  This  question  deals  with  the  complex  between  cadaverine  +H3N(CH2)nNH3+ 
(n=5)  and  its  synthetic  ditopic  receptor  (J ,-M.  Lehn  et  al,  J.  Americ.  Chem.  Soc.  1981, 
103,  4266  ;  JCS  Chem.  Commun.  1981,  833  and  1982,  657).  We  correlate  the 
deformation  of  the  cage  with  the  motion  of  various  substrates  (n=3  to  7)  and  show  that 
the  correlation  coefficient  peaks  for  the  substrate  which  is  bound  the  most  selectively 
(n=5).  The  choice  of  the  distances  which  we  correlated  comes  from  the  analysis  of  the 
low  frequency  normal  modes  of  vibration  in  the  complexes  :  the  first  mode  indicates 
synchronous  motions  of  the  receptor  and  of  the  substrate  (P.  Auffinger,  G.  Wipff, 
unpublished  results  ;  P.  Auffinger,  DEA  1987,  Strasbourg). 


MEHANI  -  It  appears  that  you  have  successfully  generated,  using  molecular  dynamics, 
the  solid  state  crystal  structure  conformers  of  cryptand  [2.2.2]  -potassium  ion  complex 
as  a  low  energy  conformer-  If  this  structure  has  a  mirror  image  conformer,  then  does 
your  molecular  dynamics  studies  generate  both  mirror  image  conformers  equally  ? 

WIPFF  -  The  K  conformer  of  222  is  not  generated  from  the  simulations  on  free  222.  It 
appears  in  the  High  Temperature  Annealed  MD  Simulations  only  when  K+  is  modelled 
in  the  cage.  Although  in  principle  both  mirror  image  conformers  should  appear  with 
equal  probabilities  in  a  purely  random  process,  the  43  K  forms  generated  have  in  fact 
the  same  chirality.  This  is  because  the  K+  cation  prevents  such  interconversions  on  the 
time  scale  of  100  ps  in  vacuo.  It  is  thus  likely  that  in  solution  interconversion  between 
mirror  images  of  222  occurs  via  decomplexation-  complexation  processes.  It  is  also 
noteworthy  that  in  the  crystalline  state,  the  NH4+  and  K+  cryptates  of  222  display 
different  enantiomeric  forms  of  the  cage. 


165 


Modelling  of  Molecular  Structures  and  Properties.  Proceedings  of  an  International  Meeting, 
Nancy,  France,  11-15  September  1989,  J.-L.  Rivail  (Ed.) 

Studies  in  Physical  and  Theoretical  Chemistry,  Volume  71,  pages  165-171 
©  1990  Elsevier  Science  Publishers  B.V.,  Amsterdam  —  Printed  in  The  Netherlands 
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SUMMARY  2+ 

For  the  Fe  /Fe^  self-exchange  reaction  in  solution  it  has  been  shown  that 
the  transition  state  is  not  reached  starting  from  the  reactants'  region 
unless  a  conveniently  designed  sampling  potential  is  employed.  The  free 
energy  barrier  has  been  calculated  using  two  different  techniques.  The  best 
value  obtained  using  statistical  perturbation  theory  is  176.5  kO/mol . 

INTRODUCTION 

The  classical  treatment  (refs.  1-2)  of  electron  transfer  reactions 
generally  use  an  activated  complex  formalism.  According  to  the  Franck-Condon 
principle  (ref.  3),  internuclear  and  nuclear  velocities  de  not  change  during 
an  electron  transition.  Therefore,  electron  transfer  must  occur,  within  this 
classical  frame,  at  the  intersection  region  S*  of  the  diabatic  potential 
hypersurfaces  Hpp  and  Hss,  corresponding  respectively  to  the  precursor  and 
successor  complexes.  This  region  S*  is  reached  by  a  suitable  fluctuation  in 
the  nuclear  configuration  of  the  precursor  complex.  The  electronic  coupling 
matrix  element  between  both  diabatic  states  is  assumed  to  be  large  enough  so 
that  the  reactants  are  converted  into  products  with  unit  probability  in  the 
intersection  region,  but  small  enough  so  that  it  can  be  neglected  in 
calculating  the  amount  of  energy  required  to  reach  S*  (ref.  4). 

In  this  paper  we  focuse  our  attention  on  the  outer-sphere  self-exchange 
reaction: 


r  +  2+  f-  2+  r  + 

Fe  +  Fe  - Fe  +  Fe 

in  water.  In  this  kind  of  processes,  holding  fixed  the  two  metal  ions, 
nuclear  configurations  are  completely  defined  by  solvent  coordinates.  To 
generate  these  configurations  a  statistical  technique  such  as  the  Monte  Carlo 
method  is  adequate.  However,  those  fluctuations  suitable  for  the  electron 
transfer  process  have  a  very  low  probability  and  it  is  practically  impossible 
to  generate  them  upon  starting  from  an  equilibrium  distribution  of  the 
solvent  around  the  reactants  (refs.  5-7). 
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Our  purpose  in  this  paper  has  been  twofold.  In  first  place,  to  show  that 
even  with  a  small  cluster  of  50  water  molecules  around  both  ions,  S*  is  not 
reached  unless  a  conveniently  designed  sampling  potential  is  employed  (refs. 
8-9). 

On  the  other  hand  and  in  order  to  measure  the  probability  that  the  system 
attains  to  the  transition  state  region  S*,  relative  to  the  probability  of 
being  at  the  reactants'  region,  the  free  energy  barrier  is  needed.  In  this 
way,  we  have  used  two  different  techniques  to  calculate  the  AFr  of  the 
exchange  reaction  Fe  /Fe  in  an  aqueous  solution  including  125  water 
molecules. 

METHODOLOGY 

In  this  paper  we  have  studied  the  self-exchange  reaction  between  Fe+  and 
2+ 

Fe  ions  which  takes  place  in  both  a  cluster  of  50  water  molecules  and  in  a 
dilute  aqueous  solution  including  125  water  molecules.  In  the  simulations  the 

O 

two  ions  are  hold  fixed  a  distance  of  7  A.  When  simulating  a  dilute  aqueous 

solution,  the  size  of  the  box  has  been  chosen  in  such  a  way  that  125  water 

molecules  give  a  density  of  about  lgr/cm  and  periodic  boundary  conditions 

under  the  minimum  image  convention  have  been  used.  To  generate  solvent 

configurations,  the  Monte  Carlo  method  (ref.  10)  within  the  Metropolis' 

algorithm  (ref.  11)  has  been  employed.  In  order  to  calculate  the  energies  of 

the  diabatic  hypersurfaces  Hpp  and  Hss,  pairwise  additive  potential  functions 

have  been  considered.  We  have  used  the  MCY  potential  (ref.  12)  for  the 

water-water  interaction,  an  ab  initio  analytical  potential  generated  by  us 

for  the  Fe+-H90  interaction  (ref.  13)  and  an  empirical  potential  for  the 
2+ 

Fe  -H2O  interaction  (ref.  14).  For  all  simulations  a  previous  equilibration 
of  the  system  followed  by  statistical  analysis  done  over  additional  configu¬ 
rations,  have  been  carried  out. 

To  identify  which  of  the  Monte  Carlo  generated  configurations  belong  to 

the  intersection  hypersurface  S*,  the  following  computational  procedure  has 

been  performed:  For  each  one  of  the  obtained  solvent  configurations  of  the 

precursor  complex  (Fe  /Fe  )aq,  both  the  solvation  energy  of  this  complex 

and  the  solvation  energy  of  the  successor  complex  (Fe  /Fe  )aq  obtained  by 

keeping  unchanged  the  water  molecules  coordinates  but  replacing  the  Fe+  and 
2+  2+  -f 

Fe  ions  by  Fe  and  Fe  ,  respectively,  have  been  calculated.  The  difference 
between  these  two  solvation  energies  gives  the  change  (AEso1v)  in  solvation 
energy  between  Hpp  and  Hss  for  the  frozen  solvent  configuration.  Only  those 
configurations  with  A  EJO|V=0  are  rigorously  isoenergetic  when  the  electron 
transfer  takes  place  and  therefore  correspond  to  S*. 
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RESULTS  AND  DISCUSSION 

In  first  place  we  present  the  results  obtained  for  the  cluster  of  50  water 

molecules  in  those  simulations  carried  out  computing  the  total  energy  by 

means  of  the  Hpp  potential.  It  corresponds  to  generate  equilibrium  nuclear 

+  2+ 

configurations  of  the  solvent  around  Fe  and  Fe  in  the  precursor  complex. 
For  practical  purposes,  the  wider  criterion  |  AE$0ivj^5  kO/mol  has  been 
adopted  in  order  to  classify  a  given  configuration  as  isoenergetic.  To 
introduce  the  temperature  effect  we  have  performed  simulations  at  298  K,  350 
K  and  500  K.  For  each  temperature  we  have  generated  1,000,000  configurations 
in  order  to  achieve  equilibration  and  the  corresponding  analysis  have  been 
done  over  1,000,000  additional  configurations  in  each  case.  The  energetic 
results  for  these  statistical  computations  are  presented  in  Table  1. 

TABLE  1 

Mean  energy  values3  of  the  cluster  containing  50  water  molecules  at  different 
temperatures. 


T/K 

298 

350 

500 

E 

-3471.8 

-3327.2 

-3062.3 

a  In  kJ/mol 


Regarding  the  mean  energy  values  obtained,  we  observe  that  when  temperature 
is  increased,  regions  of  higher  energy  are  populated.  Nevertheless,  no 
configuration  corresponding  to  S*  has  been  identified.  In  this  particular 
case,  even  if  we  use  an  exaggerated  value  of  |AEso1v|£  20  kJ/mol,  we  do  not 
find  any  isoenergetic  configuration. 

A  stratagem  that  it  is  a  very  efficient  method  in  order  to  generate 
configurations  corresponding  to  the  transition  state  consists  in  employing  a 
sampling  potential  of  the  form  (ref.  8): 

U  =  (1-A  )  Hpp  +  AHss  (1) 

A 

taking  the  parameter  A  equal  to  In  this  way,  we  have  carried  out  a 
simulation  at  298  K  on  the  cluster  of  50  water  molecules.  2,000,000 
configurations  have  been  needed  to  achieve  equilibrium  while  the  analysis 
have  been  done  over  800,000  additional  configurations.  Using  this  potential, 
we  have  been  able  to  identify  a  great  number  of  configurations  of  the 
transition  state  when  the  above  mentioned  criterion  of  lAE^J  ^  5  kJ/mol  is 
adopted.  The  number  of  isoenergetic  structures  is  about  66,400  (8.3%).  This  en- 
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semble  of  configurations  presents  a  mean  energy  of  -3319.7  kJ/mol  which 
corresponds  to  the  same  energy  range  as  the  one  populated  by  generating 
configurations  with  the  Hpp  potential  at  350  K  (see  Table  1).  However,  no 
isoenergetic  structure  was  found  in  that  previous  study  while  we  have  just 
mentioned  the  significant  percentage  obtained  with  this  sampling  technique. 

The  great  amount  of  configurations  classified  as  isoenergetic,  enables  us 
to  study  the  structural  and  energetic  characteristics  of  the  transition  state 
of  our  reaction.  Computing  the  radial  distribution  functions  of  oxygen  atoms 
around  the  two  metal  ions  Fe+  and  Fe^+,  we  have  observed  that  both  functions 
are  nearly  identical.  The  corresponding  first  peaks  are  located  at  2.20  A. 
This  distance  involves  a  mean  solvation  structure  between  those  found  for  Fe+ 
and  Fe  ions  in  the  precursor  complex  at  equilibrium.  These  structural 
trends  are  in  agreement  with  the  fact  that  the  sampling  potential  used  tends 
to  favour  those  symmetric  configurations  close  to  S*.  Finally,  it  has  to  be 
remarked  that  the  transition  state  for  this  reaction  cannot  be  associated  to 
a  unique  well-defined  structure.  In  fact,  the  configurations  belonging  to  S* 
present  an  important  geometry  dispersion  as  well  as  an  energy  dispersion 
which  it  is  shown  in  Fig.  1. 


E  (kJmor'l 

Fig.  1.  Histogram  of  the  percent  distribution  of  isoenergetic 
configurations  at  298  K  against  their  energy. 

Following  the  strategy  explained  above,  we  can  analyze  the  nature  of  the 
transition  state  but  we  cannot  evaluate  its  probability  starting  from  the 
reactants'  region.  For  this  reason  we  have  used  two  different  methodologies 
applied  to  the  dilute  aqueous  solution  including  125  water  molecules  and 
periodic  boundary  conditions. 

The  first  methodology  is  related  with  the  parabolic  behaviour  of  the  free 
energy  surface  with  respect  to  the  solvent  polarization  coordinate  assumed  in 
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Marcus'  theory  (ref.  15).  In  this  paper  we  have  obtained  the  variation  of  AF 
with  respect  to  the  AE$0^V  parameter  taken  as  the  reaction  coordinate.  The 
extrapolation  to  AEso-|v=0  gives  the  activation  free  energy  barrier.  We  have 
performed  the  simulations  at  298  K  generating  3,000,000  configurations  for 
equilibrating  the  system  followed  by  2,000,000  structures  for  the  analysis. 
While  the  configurations  were  being  generated  in  the  Monte  Carlo  process,  we 
made  a  scanning  of  them  as  function  of AE$0iv.  We  have  obtained  configurations 
in  a  wide  range  of  AE$olv  values:  From  AE$o1v=  891  kJ/mol  where  5 
configurations  are  found  (2.5X10"\)  to  AE$q1v=  481  kJ/mol  with  21 
configurations  (10  %).  As  it  might  be  expected  there  was  none  of  them  with  A 
Esoiv  ~  0.  Collecting  the  generated  configurations  in  intervals  of  10  kJ/mol 
along  the  reaction  coordinate  AESQ-|V,  we  have  identified  the  reactants' 
region  with  the  most  populated  interval.  This  corresponds  to  the  interval 
centered  at  695.5  kJ/mol,  145790  configurations  (7.29%)  belonging  to  this 
region.  If  P  stands  for  the  number  of  configurations  of  each  interval  and 
Pmax  is  the  value  associated  with  the  reactants'  interval,  we  define: 

AF  •  -RT -fes*  <2> 

the  factor  exp  (-AF/RT)  expresses  the  probability  that  the  system  will  be  in 
a  given  interval  of  the  reaction  coordinate  A  E$olv  relative  to  the 
probability  of  being  in  the  reactants'  interval.  We  have  fitted  the  following 
parabolic  function  by  means  of  the  least-squares  method  to  the  values  of  AF 
calculated  with  the  above  expression: 

AF  ■  »<*  WZ  *  bAEsol,  F  c  O) 

The  parabolic  behaviour  of  AF  versus  AEsolv  is  excellent  as  Marcus' 

theory  had  hypothesized  (ref.  15)  and  as  it  has  recently  been  shown  by  other 

authors  in  related  systems  (refs.  7-9,16).  The  mean  absolute  deviation 

obtained  is  only  of  0.705  kJ/mol .  Extrapolating  to  AEsq1v=0  we  obtained  the 

value  of  207.3  kJ/mol  for  the  free  energy  barrier  AFA  This  value  gives  a 

-37 

relative  probability  of  about  10  .  This  number  reflects  the  fact  that  it 

was  so  difficult  to  reach  the  S*  region  starting  from  reactants  at  equilibrium. 

A  more  accurate  result  can  be  calculated  by  using  statistical  pertubation 
theory  (ref.  17).  This  approach  follows  from  eqn.  (4)  which  expresses  the 
free  energy  difference  between  systems  0  and  n  by  an  average  of  a  function  of 
their  energy  difference: 

;  V)>  ,4) 
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The  average  is  for  sampling  based  on  system  0,  so  system  £  is  treated  as  a 
perturbation.  In  this  paper  we  find  the  free  energy  associated  with  changing 
UQ=Hpp  (unperturbed  system  0)  to  the  potential  UQ  ^=1/2  Hpp  +  1/2  Hss 
(perturbed  system  £)  that  favours  the  appearance  of  isoenergetic  structures. 
To  avoid  large  perturbations,  the  coupling  parameter  of  eqn.  (1)  has  been 
used  to  smoothly  transform  the  potential,  therefore  simulations  have  been  run 
for  Ai  =  0.0,  0.1,  0.2,  0.3,  0.4  and  0.5  in  both  directions,  A *  A.  .  and 

1  l  +  l 

A  i  +y—  X  i  ,  except  at  the  two  end  points.  Oouble-wide  sampling  was  used  (ref. 
18).  In  all,  a  number  of  six  simulations  has  been  performed.  Each  run  has 
consisted  of  an  equilibrium  phase  of  1,000,000  configurations  followed  by 
averaging  over  1,000,000  additional  configurations.  The  computed  free  energy 
changes  are  presented  in  Table  2. 

TABLE  2 

Computed  free  energy  differences9  for  the  transformation  from  the 
unperturbed  to  the  perturbed  system. 


AF 


A  i 

A  j 

i-j 

j-i 

0.0 

0.1 

68.1 

62.9 

0.1 

0.2 

59.4 

49.1 

0.2 

0.3 

34.7 

35.9 

0.3 

0.4 

19.1 

12.8 

0.4 

0.5 

9.1 

1.8 

Total : 

190.4 

162.5 

9  In  kJ/mol 


It  is  interesting  to  note  that  different  values  are  obtained  for  the  two 
independent  runs,  one  in  each  direction  of  A  .  However,  this  hysteresis  is 
not  very  significant  taking  into  account  the  large  free  energy  barrier 
involved  in  the  reaction.  Anyway,  to  avoid  systematic  errors,  the  average 
value  of  176.5  kJ/mol  between  those  obtained  in  both  directions  is  adopted 


+  2+ 

as  the  most  valid  for  the  free  energy  barrier  of  the  Fe  /Fe  self -exchange 


reaction  in  solution. 

Finally,  it  should  be  noticed  that  although  the  extrapolation  method 
gives  a  somewhat  exaggerated  value  in  this  case,  it  provides  a  reasonable 
approach  to  the  value  obtained  with  a  more  precise  method. 
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SUMMARY 

BLEMO,  a  molecular  mechanics  computer  program  operating  with  the  MM2 
parametrization,  is  used  in  order  to  design  new  kinds  of  radical  traps  involving 
intramolecular  addition  of  the  radical  on  a  double  bond.  The  2-norbornenyle  series  is 
specially  studied.  Data  show  that  the  substitution  on  positions  5  and  7  with  bulky  groups 
and  structures  of  the  dimethanonaphthalene  type  lead  to  very  short  distances  between 
reactive  sites,  and  thus,  probably  to  efficient  radical  clocks. 


INTRODUCTION 

The  use  of  radical  clocks  is  a  very  convenient  approach  to  check  for  the  presence 
of  radical  intermediates  during  organic  reactions  (ref.1).  In  particular,  alkyl  halide 
precursors  of  cyclisable  radicals  have  been  successfully  applied  to  mechanistic  studies 
(ref.2,3).  The  efficiency  of  this  kind  of  radical  clocks  is  governed  by  their  cyclization  rate 
(i.e.,  intramolecular  rate  of  addition  of  the  radical  on  a  double  bond)  :  the  greater  is  the 
rate  of  rearrangement,  the  greater  is  the  possibility  to  trap  radical  intermediate.  We 
recently  synthesized(ref.4-6)  5-('e/7cfoj-/-propylsulfonyl-5-('exo>substituted-2-norbornene 
which  leads  to  cyclization  rates  near  10®-109s_1  for  the  corresponding  C-centered 
a-sulfonyl  radicals  1  (Scheme  1).  In  order  to  monitor  the  occurence  of  short  lived  radical 
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“  b  :  R  =  C6H5 

Scheme  1  Intramolecular  cyclization  of  radical  traps  of  the  S-fencfoJ-i-propylsulfonyl 
-5-('exoj-substituted-2-norbornene  series  (ref.5). 


intermediates  during  reactions  for  which  S.E.T.  was  postulated  (in  particular  when 
solvent  caged  radical  pairs  are  involved  (ref.4,5,7))  we  aim  at  designing  and  synthetising 
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free  radical  clocks  of  increasing  efficiency  higher  rates  of  cyclization  . 

Our  previous  results  led  to  the  question  :  why  does  the  substitution  of  a  cyano 
group  by  a  phenyl  group  in  the  5-exo-  position  of  2-norbornene  induce  an  increase  in 
the  cyclization  rate  of  the  radical  1  (schemel)  ?  X-ray  determination  and  force  field 
calculations  (ref.8)  performed  on  both  parent  H  compounds  (ref.  6)  suggest  that  this 
enhancement  is  mainly  due  to  the  structural  constraints  imposed  by  the  phenyl  group.  In 
particular  the  modification  of  the  valence  angle  C^Cg-Sg  (  117.5°  and  108.6° 
respectively  for  cyano  and  phenyl  substituents),  pushes  the  whole  endo-  substituent 
toward  the  double  bond  of  the  norbornenyl  group,  and  puts  closer  the  reactive  sites,  i.e. 
C9  and  C3.  Taking  into  account  these  results,  we  decided  to  investigate,  from  the 
norbornenyl  stucture  as  basis,  new  substitutions  able  to  modify  geometrical  parameters 
and  so,  to  lead  to  an  extended  range  of  cyclization  rates.  The  influence  of  the 
substitution  on  positions  5-exo,  7  and  the  interest  of  dimethanonaphtalene  was 
investigated  (scheme  2)  in  order  to  decrease  the  minimal  distance  between  the  reactive 
centres  of  the  radical  trap. 


METHODOLOGY 

The  geometry  of  each  molecule  was  optimized  with  BLEMO  (ref.  9)  a  molecular 
mechanics  computer  program  (ref.  8a)  operating  with  the  MM2  parametrization  (ref.  10). 
With  this  parametrization,  the  energy  calculated  by  BLEMO  for  a  given  geometry  is  the 
same  as  the  one  calculated  by  MM2,  as  it  was  verified  for  various  test  molecules.  In 
BLEMO  we  decided  to  take  into  account  all  Van  der  Walls  interactions  at  each  step  of  the 
minimization,  whereas  in  MM2  these  interactions  are  restricted  to  the  distances  r<  rmax= 
(12)1/2A  in  the  intermediate  calculations.  Indeed,  we  prefered  to  calculate  the  energy  in 
a  strictly  identical  manner  during  the  course  of  the  minimization  and  in  the  final  energy 
calculation  at  the  optimum.  It  is  interesting  to  note  that  this  choice  displaces  very  slightly 
the  optimal  geometry  and  the  corresponding  energy  (about  0.1  kcal/mole).  On  the  other 
hand,  in  BLEMO,  the  geometric  variables  for  the  optimization  are  internal  coordinates 
(cartesian  coordinates  are  taken  in  MM2  ).  The  starting  geometry  is  built  automatically, 
simplifying  the  entry  of  the  molecule.  In  addition,  the  program  gives  the  energy 
localization  in  the  form  of  internal  energies  and  interaction  energies  of  predefined 
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fragments  in  the  molecule. 

After  the  optimization,  the  conformation  of  the  lowest  total  energy  is  obtained.  In 
order  to  know  the  minimal  distance  between  the  reactive  sites  (  C3  and  Cg) 
corresponding  to  this  stucture,  a  rotation  around  the  C5-S3  bond  is  performed  leading 
the  Cg  atom  in  the  same  plane  as  C3,  C5  and  S3  (  Cg’C^Ss  Cg  =  0)  without  further 
minimization  (fig.  1). 


conformation  corresponding  to  conformation  corresponding  to  a 

a  minimal  total  energy  minimal  distance_C3-Cg  (d3gm;n ) 

3, 5, 8, 9  =  0° 

fig.  1  Example  of  rotation  around  the  C5-S3  bond  (from  the  conformation  corresponding 
to  the  lowest  total  energy)  leading  to  the  minimal  distance  between  C3  and  Cg 
(d3  9min)-  Depending  upon  R5  the  9-CH3  is  more  or  less  staggered  with  respect  to  the 
C5-R5  bond  in  A  whereas  it  is  clearly  transoi'd  in  B. 

RESULTS  AND  DISCUSSION 

If  the  norbornene  is  substituted  in  the  portion  7  with  alkyl  groups  (CH3,  i-C3H7), 
phenyles  or  with  spiro  systems  like  (-CHg-)^  (-CH2-)5,  the  minimal  distance  available 
between  C3  and  Cg  decreases  when  the  valence  angle  3,4,5  decreases  (table  1).  Bulky 
groups  push  on  themselves  and  close  the  norbornene.  In  contrast,  the  substitution  with 
a  three  links  spiro  ((-CHg-^).  too  constrained,  opens  the  bicyclic  system  (angle  3,4,5 
increases). 

When  the  hydrogen  on  the  position  5  is  substituted  by  an  alkyl  group  (CH3,  C2H5), 
the  methylsulfonyl  group  is  pushed  toward  the  double  bond,  the  valence  angle  4,5,8 
decreases  and  as  a  consequence,  the  distance  d3gmjn  decreases  also  (table  2). 
Intracyclic  valence  angles  of  the  bicyclo  are  not  noticeably  changed.  In  the  other  hand,  if 
the  substitution  on  the  position  5  with  an  ethyl  group  is  combined  with  the  addition  of 
i-propyl  groups  on  the  methylene  bridge,  a  very  short  distance  between  C3  and  C9 
(2.36A)  may  be  obtained  (table  3). 
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TABLE  1 

Evolution  of  the  minimal  distance  between  C3  and  C9  (d3gmjn)  and  of  the  angle  3,4,5 
with  the  substitution  on  position  7  of  the  riorbornenyle  entity. 


R7 

(CH2)2 

H 

(CH2)4 

C6H5  (CH2)5 

Me 

iPr 

d3,9min(A) 

2.62 

2.59 

2.57 

2.57  2.55 

2.55 

2.51 

3A5(°) 

110.1 

109.8 

108.9 

109.1  108.4 

108.4 

107.3 

TABLE  2 

Evolution  of  the  minimal  distance  between  C3  and  C9  (d3  9mjn)  and  of  the  angle  4,5,8 
with  the  substitution  on  position  5-exo  of  the  norbornenyle  entity. 


R5 

H 

Me 

Et 

d3,9min(A) 

2.59 

2.51 

2.48 

S(°) 

112.8 

111.9 

110.1 

TABLE  3 

geometrical  parameters  of  the  5-ethyl-7-di-i-propyl  norbornenyle  entity. 


d3,9min=  2-36  A 
3A5=  107.8° 


4,5,8  =  111.2° 


TABLE  4 

Minimal  distance  between  C3  and  C9  (d3  9mjn)  and  value  of  the  angle  4,5,8  for 
dimethanonaphtalenes. 
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fig.  2  Diagram  showing  the  relationship  between  the  steric  hindrance  of  the  substituent 
in  the  5-  or  7-  position  (Ar°v  is  the  corrected  Van  der  Walls  parameter  = 
r°v(R)-r°v(H)(ref.14))  and  geometrical  modifications  of  the  norbornenyle  entity  (minimal 
distance  between  C3  and  Cg  (d3gmjn)  and  angles  4,5,8  (variation  of  R^)  or  3,4,5 
(variation  of  R7). 


The  influence  of  the  steric  hindrance  of  the  5-  and  7-  substituents  on  d3  gmjn  and 
also  on  the  exocyclic  angle  (4,5,8)  or  the  intracyclic  angle  (3,4,5)  is  shown  on  fig.  2 
where  Van  der  Walls  radii  are  taken  as  parameters  for  the  size  of  the  substituent. 

Dimethanonaphthalene  systems  were  also  investigated,  and  in  this  case  ,  the 
small  value  for  the  valence  angle  4,5,8  is  at  the  origine  of  short  distances  observed 
between  the  C-atom  considered.  If  the  substituent  R14  (table  4)  has  only  a  little  impact 
on  d3  9mjn,  it  might  induce,  from  steric  repulsions  with  the  sulfonyl  group,  preferential 
conformations  more  favourable  for  the  attempted  cyclization.  A  methyl  group  placed  on 
the  position  6  leads  to  a  geater  variation  of  the  distance  between  C3  and  Cg  (table  5). 
Two  conformations  of  close  energy  must  be  considered.  In  the  first,  R6  and  Sg  are 
eclipsed  (the  dihedral  angle  ft5, 6, 5, 8  =  0°),  and  the  resulting  repulsion  between  the 
methyl  and  the  methylsulfonyl  groups  brings  about  a  decrease  of  the  dihedral  angle 
6,4, 5, 8  and  an  increase  of  d3  gmjn.  In  the  second  conformation  R6  and  Sg  are  slightly 
shifted  ,  6, 4, 5, 8  increases  and  d3gmjn  decreases. 
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TABLE  5 

Influence  of  a  6-methyl  group  on  the  dimethanonaphthalene.  Cfl  and  Cf2  are  related  to 
the  two  conformations  of  lowest  energy  (Etot  is  the  total  energy). 


R6 

H 

Me(Cfl) 

Me(Cf2) 

^3,9min(A) 

2.44 

2.53 

2.40 

6A5fi(c) 

-119.6 

-123.3 

-120.0 

R6,6,5,8(°) 

-2.3 

0 

-3.8 

Etot(kcal) 

75.8 

78.8 

79.2 

CONCLUSION  * 

This  study  shows  that  molecular  mechanics  calculations  are  an  efficient  tool  for  t 
exploring  the  influence  of  the  substitution  on  structural  modifications  on  5-(endo)a\ky\ 
sulfonyl-2-norbornenyl  compounds.  The  results  obtained  point  out  the  interest  of  bulky  • 

groups  place  in  positions  5  and  7  and  of  dimethanonaphthalenes  in  order  to  bring  j 

closer  the  carbon  atoms  3  and  9  which  are  reactive  sites  in  radical  clocks  of  the 
norbornenyl  series.  Similar  calculations  were  performed  on-molecules  where  the  sulfone  j 
was  replaced  by  a  carbonyl  function  or  a  methylene  group  :  the  same  general  evolutions 
are  found.  This  study  therefore  suggests  the  synrthesis  of  new  kinds  of  radical  clocks  of  > 

increased  efficiency.  If  the  introduction  of  bulky  groups  in  the  7-  position  of  the  » 

norbornene  is  syntheticaly  difficult  (ref.  11, 12),  the  substitution  in  the  5-position  seems  to  j 

be  easier  .  We  are  performing  experiments  in  this  direction  and  in  the  ; 

dimethanonaphtalene  series  (ref.  13).  \ 
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DISCUSSION 


DEV1LLERS  -  Quel  logiciel  avez-vous  utilise  pour  faire  ces  calculs  ? 

M.  FATHALLAH  -  C'est  un  programme  de  Mecanique  Moleculaire  ecrit  a  I'  ESIPSOI 
(Marseille)  par  B.  Blaive  et  M.  Fathallah  portant  le  nom  BLEMO.  La  parametrisation  est 
celle  du  programme  MM2  d'Aliinger  completee  par  une  recherche  personnelle. 


DEVILLERS  -  Qu'offre  de  plus  ce  programme  par  rapport  a  MM2  ? 

M.  FATHAL  LAH  - 

.  Le  programme  BLEMO  est  ecrit  en  Fortran  77  mais  il  a  des  entrees-  sorties 
tres  simples  d'ou  une  grande  facilite  d'utilisation. 

.  C'est  un  programme  totalement  automatise  ;  si  vous  n'avez  pas  une 
geometrie  de  depart,  il  se  charge  d'en  construire  une. 

.  La  methode  de  minimisation  est  celle  de  relaxation  ou  "pas  a  pas”. 

.  La  minimisation  se  fait  sur  les  coordonnees  internes  et  non  cartesiennes. 

.  BLEMO  a  ete  etendu  a  I'etude  de  I'atome  de  Fer  et  a  d’autres  heteroatomes. 


DEVILLERS  -  Comment  peut-on  se  procurer  ce  programme  ? 

M.  FATHALLAH  -  II  sutfit  d'ecrire  a  B.  Blaive  ou  M.  Fathallah.  De  plus  on  a  une  version 
Mac  qui  sera  disponible  d’ici  deux  mois. 


BRUNETIERE  -  Dans  quelle  mesure  interviennent  les  energies  des  differentes 
conformations  ? 

M.  FATHALLAH  -  Les  composes  exocyano  et  exophenyle  1  et  2  presentent  entre  leur 
conformation  la  plus  stable  et  celle  correspondant  a  la  distance  minimale  des 
differences  d'energie  relativement  faibles  (environ  5  kcal  M'^et  proches  I'une  de 
I'autre.  Les  populations  des  radicaux  residant  a  la  distance  minimaie  sont  done 
voisines  pour  1  et  2. 
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SUMMARY 

Differential  thermal  analysis  has  been  used  to  study  the  phase 
behaviours  of  cyclohexane,  C..I'  »  anci  cieuteratec!  cyclohexane, 
CfiD  ol  both  in  closed  and  openJ DTA-wel Is .  The  aim  of  the  latter 
experiment,  was  to  investigate  the  possible  influence  of  dissolved 
helium  on  the  phase  diagram.  Both  substances  exhibit  rotator  and 
non-rotator  phases  as  well  as  new  high-pressure  induced  phase 
transitions. 

A  six-centre  Lennard-Dones  potential  is  proposed  for  cyclohex¬ 
ane  and  employed  to  molecular  dynamics  computations.  Preliminary 
calculations  concerning  transport  properties  and  the  rotational 
behaviour  predict  the  phase  transition  of  liquid  CpH..?  to  the 
plastic  phase  realistically.  However,  the  solid  state  points  cor¬ 
respond  rather  to  an  amorphous  phase.  Therefore  the  potential  was 
slightly  adjusted  in  order  to  be  compatible  with  a  stable  fee 
structure. 


INTRODUCTION 

Cyclohexane  is  a  well-known  representative  of  or ientationally 
disordered  ("plastic")  crystals.  It  exhibits  a  solid  state  transi¬ 
tion  at  .186  K  and  atmospheric  pressure,  where  the  monoclinic  crys¬ 
tal  structure  (solid  II)  changes  to  a  hi gh-temperaturc  plastic 
phase  (solid  I)  of  fee  structure  (ref.  1).  Also  metastable  forms 
have  been  detected,  depending  on  thermal  treatment  (ref.  2)  and 
sample  size  (ref.  3).  In  the  past  new  pressure-induced  phase  tran¬ 
sitions  have  be"  •  fo  '  both  in  cyclohexane,  C^Uj,,,  (ref.  4)  and 
deuterated  cycloh  e  '6U12  (rof'  '',|ierRns  GqHt  2  rcveols  onlY 
one  now  high  pres.. a.  a  phase  (denoted  as  solid  III),  L"o  solid 
phases  (III  and  IV)  are  observed  in  Crn.„.  The  enthalpy  changes 
for  the  low-temperature  transitions  solid  II  -  III,  solid  II  -  IV, 
or  solid  IV  -  III  are  much  less  than  for  those  to  the  plastic 


phase  solid  I.  Therefore  we  suppose  that  the  high-pressure  forms 
III  and  IV  are  not  plastic  phases. 

Until  now  there  exist  no  spectroscopic  studies  at  high  pres¬ 
sures  that  enable  us  to  elucidate  the  crystal  structure  and  mole¬ 
cular  dynamics  of  the  new  high-pressure  phases,  apart  from  a  rec¬ 
ent  neutron  scattering  experiment  (ref.  6).  On  the  other  hand, 
computer  simulations  are  increasingly  employed  to  analyze  crystal 
structure  transformations  as  well  as  the  features  of  rotator  pha¬ 
ses  (ref.  7).  Therefore  we  have  performed  molecular  dynamics  (MO) 
calculations  in  order  to  get  more  insight  into  the  dynamic  proper¬ 
ties  of  the  various  phases  of  cyclohexane.  Preliminary  results 
(ref.  8)  are  encouraging  and  suggest  further  calculations. 

EXPERIMENTAL 

Differential  thermal  analysis  (DTA)  has  been  used  to  study  the 
phase  behaviours  of  Cgl!12  and  CgD12.  In  general  transition  tempe¬ 
ratures  are  determined  from  DTA  heating  runs  at  a  rate  of  2  K/min. 
Details  of  the  high-pressure  set-up  have  been  described  previously 
(refs.  4,9).  Pressure  is  generated  by  compressed  argon,  nitrogen 
or  helium.  The  substance  under  investigation  is  enclosed  in  a 
small  container  of  indium,  the  walls  of  which  transmit  the  pres¬ 
sure  but  prevent  the  compressed  gas  from  being  dissolved  in  the 
substance.  It  is  well  known  that  solutions  of  compressed  gases  in 
organic  compounds  at  high  pressures  may  change  the  pease  behaviour 
(ref.  10)  and  conformational  equilibria  (ref.  11).  Because  the 
neutron  scattering  study  mentioned  in  the  Introduction  (ref.  6) 
has  been  performed  under  helium  pressure,  we  reinvestigated  the 
phase  situation  of  CgH ^2  and  CgDir>,  using  open  DTA  wells  and 
helium  as  a  pressure  transmitting  medium. 

Within  the  limits  of  experimental  error  (+0.5  K)  no  changes  of 
the  transition  temperatures  were  observed.  Both  for  C6ll.l2  onc) 
CgD12  the  DTA  peak  due  to  the  solid  III  -  I  transition  appeared 
sometimes  a  bit  broadened,  but  an  additional  transition  peak  could 
not  be  resolved.  However,  a  splitting  of  the  solid  solid  transi¬ 
tions  at  lower  ternperatu  -es  (CgH12:  II  -  III,  CgD12:  IV  -  III  and 
very  seldom  also  II  -  IV)  occurred  after  several  repeated  runs. 
The  longer  the  measuring  time  and  the  slower  the  heating  rate  the 
more  frequently  such  splittings  were  observed.  Examples  are  shown 
in  Fig.  1.  However,  it  should  be  mentioned  that  in  the  case  of 
Cglli2  the  splitting  was  only  observed  after  repeated  runs  of 
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several  days.  It  is  possible  that  such  long  runs  could  introduce 
impurities  in  the  open  OTA  wells  that  favour  the  occurrence  of 
metastable  phases  (ref.  12). 

Whether  this  finding  is  an  artefact  or  an  indication  of  further 
solid  phases  is  still  open  to  question.  The  possible  existence  of 
a  fifth  solid  phase  has  recently  been  proposed  for  in  the 

context  of  high  pressure  neutron  scattering  studies  (ref.  13). 

MP  CALCULATIONS 

For  the  WD  calculations  the  method  of  constrained  dynamics  was 
applied,  using  the  Stoermer-Ver let  algorithm  for  numerical  inte¬ 
gration,  as  described  in  previous  papers  (refs.  14,35).  A  six- 
centre  Lennarrt-Oones  (LO)  potential  is  used  for  Cgl!^/  each  CH2~ 
group  being  represented  by  a  single  L3  centre.  Although  this  is  an 
approximation,  particularly  for  closed-packed  crystals,  such  a 
model  potential  has  been  proven  to  be  successful  in  predicting  the 
phase  transition  from  the  liquid  state  to  the  rotator  phase  (ref. 
8).  Certainly  further  refinements  of  the  model  potential  are 
necessary  in  order  to  explain  the  complicated  high-pressure  phase 
behaviours. 

First  results  of  the  computations  have  been  reported  (ref.  8), 
some  technical  details  are  summarized  in  Table  1.  Using  the  poten¬ 
tial  mentioned  above  we  found  that  the  fee  lattice  of  our  model 
Co* *1 2  *s  no,:  sta,)le  over  a  period  of  about  100  ps.  This  is  indic¬ 
ated  by  the  plots  of  the  radial  pair  correlation  function,  g(r), 
of  the  centres  of  masses  given  in  Fig.  2a.  Apparently  the  struc¬ 
ture  is  changed  towards  an  amorphous  solid  during  the  period  of 
the  MO  computation  of  about  100  ps. 

This  drawback  has  been  overcome  by  a  further  slight  adjustment 
of  the  a-parameter  of  the  L3-potential  of  each  atomic  site.  Indeed 
Fig.  2b  shows  that  the  shape  of  the  g(r)  curve  is  maintained  over 
a  computation  period  of  100  ps.  Enlargement  of  0  by  about  3%  has 
two  positive  effects:  It  stabilizes  the  fee  lattice  and  brings  the 
first  neighbour  separations  (6.03  ft)  between  the  centres  of  two 
molecules  in  better  agreement  with  experiment  (6.09  ft,  ref.  1). 

On  the  other  hand,  the  slight  adjustment  of  d  has  only  small 
influence  on  the  thermodynamic  quantities  and  the  transport  coef¬ 
ficients  as  we  have  checked  previously  for  the  liquid  model.  This 
improved  model  potential  will  be  used  in  further  computations  of 
the  phase  behaviours  of  cyclohexane,  especially  for  the  low- 
temperaturo  phase  transitions. 
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TABLE  1 

Technical  details  of  the  MD  computation 


Time  step 

Number  of  integration  steps 


Structure  of  the  CgH12  model 


L3  parameters 


0.005  ps 

7000  with  N=108  for  the  single¬ 
particle  properties, 

80000  with  N=32  for  collective 
properties 

N  =  number  of  molecules 

chair  form;  separation  between 
nearest  L3  centres;  1.78  A 


e/k  =  78  K; 
1  8  *  10~10m 


d  •  3.97  8 
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SUMMARY 

The  molecular  dynamics  simulation  technique  has  been  applied  to  study  the 
solvation  of  the  Na+  ion  in' binary  solvent  water-hexamethylphosphorictriamide 
(HMPT) .  Simulations  with  five  different  initial  positions  of  ion  were  per¬ 
formed.  Both  time-averaged  and  time-dependent  properties  of  water  molecules  were 
calculated  in  two  regions  of  the  MD  box  related  to  the  polar  and  non-polar  sites 
of  the  HMPT  molecule  and  in  the  first  and  second  solvation  shells  of  the  ion 
separately.  Results  are  reported  for  various  solute-solvent  and  solvent-solvent 
atom-atom  radial  distributions,  hydrogen-bonded  network  structure,  IR  spectra 
and  mutual  arrangement  of  HMPT  and  cation.  Preferential  solvation  of  the  sodium 
ion  by  HMPT  was  observed  in  one  from  five  cases. 


INTRODUCTION 

Investigation  of  ionic  solvation  phenomena  in  binary  mixed  solvents  at  the 
microscopic  level  is  essential  for  the  understanding  of  physical  properties  of 
electrolyte  solutions  and  their  behaviour  in  many  chemical,  technological  and 
biological  processes.  Numerous  experiments  have  been  devoted  to  studiing  this 
problem  and  even  some  generalizations  were  obtained  (e.g.ref.l)  but  microscopic 
features  of  the  events  are  still  poorly  understood.  One  of  the  most  powerful 
tools  for  insight  into  this  question  is  computer  simulation. 

In  this  work  molecular  dynamics  simulation  technique  has  been  applied  to  the 
system  Na+-water-HMPT.  The  main  reasons  of  our  interest  in  this  system  are  as 
follows.  HMPT  (  P0(N(CH3)2)3,  fig.l  )  has  6  hydrophobic  CH3  groups  surrounding 
P  atom  and  a  highly  polar  P-0  group,  Gutmann's  donor  number  DN-38.8  and  only 
few  amines  have  higher  DN's  (ref. 2).  The  oxygen  atom  can  form  2  strong  H-bonds 
with  water  molecules  (according  to  IR  and  NMR  spectra  and  dipole  measurements 
in  CC14).  So,  HMPT-water  mixtures  are  characterized  by  strong  solvent-solvent 
interactions.  HMPT  is  unique  in  combination  of  both  hydrophobic  and  hydrophy- 
lic  types  of  hydration  with  high  (nearly  spherical)  symmetry  of  its  molecules. 
These  types  of  interactions  and  structural  peculiarities  are  reflected  in  a 
number  of  physical  quantities  of  the  solution.  Investigations  of  systems  con¬ 
taining  HMPT  and  water  as  a  mixed  solvent  can  help  to  solve  the  long  standing 
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problem  on  a  mutual  dependence  of  both  types  of  hydration. 

HMPT  has  high  solvating  ability  for  cations  and  it  can  be  expected  that  se¬ 
lective  solvation  of  Na+  ion  can  occur  in  a  water-HMPT  mixture.  The  fact  that 
the  sodium  ion  is  preferentially  solvated  by  HMPT  was  confirmed  by  NMR  studies 
(ref. 3)  and  we  made  attempt  to  verify  this  in  computer  simulations. 

Earlier  we  applied  molecular  dynamics  method  for  water-HMPT  solution  (ref. 4) 
and  for  Na+  ion  in  water.  Our  results  were  in  good  agreement  with  experiment. 


DETAILS  OF  THE  SIMULATION 
Molecular  dynamics 

The  system  considered  in  this  paper  consists  of  an  ion,  one  HMPT  and  205  wa¬ 
ter  molecules  in  a  cubic  box  with  periodic  boundary  conditions.  Its  site  length 
of  1.86nm  corresponds  to  an  experimental  density  of  the  ^O-HMPT  solution  with 
molar  fraction  of  HMPT  x»l/2Q6i:0.0049  at  300K.  The  equations  of  motion  were 
integrated  using  a  leap-frog  algorithm  (ref. 5)  with  a  time  step  of  2fs.  The 
SHAKE  procedure  (ref. 6)  was  used  to  constraint  the  internal  geometry  of  the 
HMPT  and  water  molecules.  The  temperature  was  kept  constant  by  coupling  to  an 
external  heat  bath  (ref. 7)  with  a  time  constant  of  O.lps. 

HMPT  molecule  was  inserted  in  the  box  contained  the  preliminary  equilibrated 
system  of  216  water  molecules.  The  center  of  the  HMPT  molecule  coincided  with 
the  center  of  the  box,  the  initial  coordinates  of  atoms  of  the  HMPT  molecule 
were  taken  from  X-ray  data.  Water  molecules  having  a  distance  less  than  0.2nm 
from  any  atom  of  the  HMPT  were  deleted.  After  insertion  of  the  solute  molecule 
an  equilibration  period  of  4ps  was  applied. 

Five  molecular  dynamics  simulations  labelled  as  A,B,C,D  and  E  (Fig.l)  of 
system  sodium-watsr-HMPT  were  performed.  (S  is  a  mark  for  HD  run  of  a  water- 
HMPT  system  from  ref  .4,  Na^HjO  simulation  marked  as  I).  For  each  of  five  runs 
we  choose  a  water  molecule  near  either  HMPT  atom  and  replace  it  by  a  sodium 
ion.  Initial  distances  between  the  Na+  ion  and  two  atoms  of  HMPT  molecule  are 
listed  in  Table  1.  Each  5-ps  run  was  executed  after  an  equilibration  procedure 
(usually  4-5ps).  The  length  of  a  simulation  limited  by  type  of  computer  used, 
was  chosen  to  give  at  least  qualitative  results  on  properties  with  short 
characteristic  time. 

Model  parameters 

The  model  potential  for  the  water  intermolecular  interactions  used  in  the 
simulation  is  the  SPC  model  developed  by  Berendsen  et  al.(ref.8).  A  HMPT-HMPT 
intermolecular  potential  also  of  12-6-1  type  was  defined  (ref .4)  in  similar 
terms.  The  Lennard-Jones  force  constants  were  calculated  using  the  energy 
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TABLE  1 


Distances  (in  no)  between  the 
Na+  ion  and  HMPT  atoms4. 


MD 

run 

Initial 
rIOX  rIPH 

Equilibrium15 
rI0X  rIPH 

A 

0.50 

0.63 

0.23 

0.37 

B 

0.74 

0.87 

0.58 

0.72 

C 

0.40 

0.46 

0.90 

0.98 

D 

0.52 

0.52 

0.95 

0.95 

E 

0.69 

0.55 

1.13 

0.99 

aIOX  and  IPH  refer  to  ion-oxygen  and 
.ion-phosphorous  separations 
^Distances  averaged  over  equilibrium 
configurations 


Fig.  1.  Evolution  of  the  ion 
position  in  the  MD  box  during 
the  simulation. 


minimization  of  the  HMPT  dimer.  Charges  were  chosen  to  give  a  satisfactory 
value  for  the  dipole  moment  of  HMPT.  The  methyl  groups  of  HMPT  were  treated  as 
united  atoms.  Potential  used  for  ionic  interactions  was  taken  from  ref. 9. 
Interaction  parameters  of  Na+  ion  with  other  atoms  and  those  of  water-HMPT 
interactions  were  obtained  using  combination  rules. A  cutoff  radius  of  0.9nm  was 
applied  for  all  interactions.  "Shifted  force"  modification  (ref. 10)  of 
potential  has  been  employed  to  avoid  truncation  errors. 


SOLVENT  STRUCTURE 

The  complex  character  of  the  system  leads  to  some  differences  in  water 
distribution  in  either  part  of  the  MD  box.  Water-water  radial  distributions 
were  calculated  separately  near  the  polar  and  hydrophobic  sites  of  HMPT  and  in 
the  first  and  second  solvation  shells  of  the  sodium  ion. 

In  our  MD  simulations  we  found  three  types  of  relatively  stable  positions 
of  the  sodium  ion  with  regard  to  the  HMPT  molecule.  Averaged  distances  between 
the  Na+  ion  and  the  HMPT  molecule  in  runs  A  and  B  correspond  to  positions  of 
the  first  and  second  minima  of  the  sodium-HMPT  mean  force  potential.  In  runs  C, 
D  and  E  the  ion  is  situated  far  from  the  HMPT  molecule. 

In  the  equilibrium  configuration  A  the  Na+  ion  replaces  one  of  two  water 
molecules  in  the  first  solvation  shell  of  the  oxygen  atom  of  the  HMPT.  In  the 
water-HMPT  mixture  these  two  molecules  form  hydrogen  bonds  with  the  HMPT  oxygen 
atom. Presence  of  the  Na+  ion  leads  to  distortion  or  even  to  destruction  of  the 
hydrogen  bond  between  the  second  water  molecule  and  the  HMPT  oxygen  atom.  There 
are  in  all  four  water  molecules  in  the  first  solvation  shell  of  the  sodium  ion 
in  A  configuration.  Their  arrangement  is  energetically  advantageous  and  hence 
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It  is  stable.  But  orientations  of  these  water  molecules  are  mostly  determined 
by  both  the  ion  and  the  HMPT  oxygen  atom,  and  links  between  water  molecules 
from  the  first  and  second  solvation  shells  are  weak,  average  number  of  hydrogen 
bonds  per  water  molecule  is  only  0.75.  Results  of  this  run  (Tables  1,2)  are 
very  similar  to  those  obtained  from  MD  simulation  of  a  Na+  ion  bound  to  a 
dihydrogen  phosphate  ion  in  a  small  water  cluster  (ref. 11). 

In  run  C  we  observed  a  hexahydrated  sodium  ion  with  the  water  molecules  in 
the  octahedral  arrangement  practically  identical  to  one  found  in  the  Na^HgO 
solution  simulation  (run  I).  But  influence  of  the  HMPT  molecule  leads  to  some 
changes  in  the  orientation  of  the  water  molecules  in  octahedron  apexes  and 
therefore  to  the  deviations  in  hydrogen  bonds  angles  and  energies. 

By  inserting  of  the  sodium  ion  in  the  hydrophobic  region  of  HMPT  hydration 
(run  E)  one  can  observe  that  the  first  peak  in  the  water  oxygen-oxygen  RDF  near 
polar  group  of  HMPT  (Fig. 2)  as  well  as  one  in  the  HMPT  oxygen  -  water  oxygen 
RDF  (Fig. 3)  becomes  weaker  than  in  water-HHPT  simulation  (run  S).  It  can  be 
concluded  that  sodium  ion  exerts  a  structure-breaking  influence  on  the  water 
network  near  the  hydrophylic  group  of  the  HMPT  molecule.  It  is  interesting, 
that  sodium  -  water  oxygen  radial  distributions  are  similar  in  both  mixed 
solvent  and  water  (Fig. 4). 


3 

2 
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Fig.  2.  Water  oxygen-oxygen  radial  distribution  functions  near  different  sites 
of  solute  molecule  and  averaged  over  the  whole  box  from  runs  S  (dots)  and 
E  (full  line). 
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Fig.  3.  Solute  atom-water  oxygen  radial 
distribution  functions  (lines  as  in  Fig. 2) 
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Fig.  4.  Ion-oxygen  RDF  from 
runs  1  (dots)  and  E  (full). 
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An  analysis  of  the  hydrogen  bonds  gives  an  additional  information  about 
solvent  structure.  For  defining  the  number  of  hydrogen  bonds  we  considered  only 
geometrical  criteria  (r0^<0.2nm  and  angle  0-H-0>140°) .  Orientational  and 
energetic  distributions  of  water  hydrogen  bonds  around  the  HMPT  molecule  are 
displayed  in  Fig. 5.  Table  2  collects  average  numbers  of  hydrogen  bonds  per 
water  molecule  and  some  geometrical  characteristics  of  water  molecules 
arrangement  in  the  Na+  ion  first  solvation  shell.  Data  from  Table  2  and  Fig. 5 
confirm  the  fact  that  in  the  presence  of  the  ion  water  is  more  ordered  in  the 
non-polar  region  of  HMPT  hydration  and  less  ordered  in  the  polar  one  than 
without  ion. 


TABLE  2 

Geometrical  properties  of  the  sodium  ion  water  arrangement 


Run 

N 

^Na-O-H 

CK 

ft 

r 

nhb 

A 

4.0 

82 

154 

45 

130 

90 

0.75 

170 

B 

5.5 

103 

135 

40 

105 

60 

1.09 

165 

C 

6.0 

108 

130 

30 

105 

60 

1.50 

160 

N-hydration  number, between  Na-0  and  0-H  vectors,  a, ft  and  r  • 

angles  between  Na-0  vector  and  water  molecules  rotation  vectors  E  ,  E  and  E 

x  y  z 

respectively  (x  is  coincide  with  direction  of  the  dipole  moment  of  water 
molecule,  y  is  perpendicular  to  x  in  plane  H-O-H  and  z  is  perpendicular  to 
plane  H-O-H,  N^-average  number  of  hydrogen  bonds  per  water  molecule 
in  the  first  solvation  shtll,  p^-average  angles  of  these  hydrogen  bonds. 


Fig.  5.  Angular  and  energetical  distributions  of  hydrogen  bonds  in  water 
(lines  as  in  Fig. 2} . 


KINETIC  PROPERTIES 

To  obtain  further  information  about  peculiarities  of  sodium  solvation  in 
mixed  solvent,  we  have  also  calculated  various  auto-  and  cross-correlation 
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functions  and  their  Fourier  transforms.  Some  of  them  we  discuss  below. 

Relaxation  times  of  the  dipole  moments  of  water  molecules  in  the  first 

solvation  shell  of  the  Na+  ion  are  very  similar  in  run3  A,  B  and  C  (Fig. 6)  and 

their  value  (2.5^2.6ps)  is  in  good  agreement  with  experimental  one  (ref. 12)  and 

with  results  of  our  MD  simulation  of  water-HMPT  solution  (ref .4).  But  relaxation 

times  of  rotation  of  the  unit  vectors  Eu  and  E  are  significantly  different  from 

y  z 

those  of  dipole  moment,  so  rotation  of  the  shell  water  molecules  is  anisotropic 
and  this  anisotropy  increases  in  row  C,  A,  B.  We  suppose  that  this  increase 
results  from  the  dominant  Coulomb  interactions,  in  the  run  A  and  from  formation 
of  common  hydration  shell  of  the  sodium  ion  and  the  solute  oxygen  atom  in  the 
run  B.  Rotation  of  the  shell  water  molecules  in  the  runs  D  and  E  as  well  as  in 
the  run  C  is  nearly  isotropic,  apparently  because  the  ion  is  positioned  relati¬ 
vely  far  from  the  HMPT  in  these  cases. 

The  spectral  densities  of  the  hindered  translations  of  the  Na+  ion,  shell 
and  bulk  water  molecules  are  shown  in  Fig. 7.  The  diffusional  component  of  Na+ 
and  shell  water  molecules  motion  is  larger  than  for  bulk  water  and  for  water  in 
H20-HMPT  solution.  This  difference  in  the  diffusional  motion  can  be  related  to 
the  preferential  solvation  of  Na+  by  HMPT. 


1  Fig.  6.  Logarithm  of  the  components  of  the  shell  water  molecules  rotational 

i  vectors  autocorrelation  functions:  E  (dashed),  E  (dots)  and  E  (full). 

|  Axes  are  defined  in  Table  2  footnote.  x  y  z 


shell  water  (dashed)  and  bulk  water  (dots). 


0 


0 


500 


1000 


500 


V 


cm 


-i 


Fig.  8..  Spectral  densities  {arbitrary  units)  calculated  by  Fourier 
transformation  of  the  components  of  the  shell  water. molecules  angular 
velocities  autocorrelation  functions  (see  Fig, 6). 


The  picture  of  preferential  solvation  is  also  supported  by  the  investigations 
of  librational  motions  of  the  shell  water  molecules  (Fig. 8).  The  normalized 
angular  velocity  autocorrelation  functions  have  been  Fourier  transformed  for  the 
components  in  the  three  main  axes  of  the  water  molecule  (as  described  in  the 
footnote  of  Table  2).  The  peaks  corresponding  to  librations  are  shifted  to 
higher  frequencies  for  the  runs  A  and  B  relative  to  the  run  C.  Therefore  the 
librational  motions  of  the  first  solvation  shell  water  molecules  are  influenced 
not  only  by  the  strong  interactions  with  the  ion  but  also  by  the  interactions 
with  the  other  molecules  belonging  to  the  and  the  second  solvation  shell,  and 
the  influence  of  the  polar  site  of  the  HMPT  prevail  over  the  nearest  water 
molecules  influence. 

It  is  impossible  to  investigate  quantitatively  the  problem  of  ionic 
transport  using  short  MD  experiments  such  as  this  one.  But  some  qualitative 
pictures  elucidating  the  mechanism  of  ionic  mobility  can  be  found  through 
direct  observations  as  for  example  shown  in  Fig. 9.  This  is  one  of  the  represen¬ 
tative  segments  of  the  functions  r^ap(t)  and  N(t)  from  the  run  E.  The  value  of 
N  is  defined  simple  as  the  number  of  the  water  molecules  lying  within  the  first 
solvation  shell  at  this  time  step.  One  can  see  that  the  changes  of  the  ionic 
velocity  with  respect  to  HMPT  coincided 
with  the  extremes  on  the  above  curve 
in  Fig. 9  occur  at  the  same  time  when 
either  water  molecule  leaves  or  enters 
the  first  solvation  shell.  So,  ionic 
motion  is  closely  connected  with  local 
density  fluctuations  in  the  solvent. 

This  result  is  in  good  agreement  with  3,5  4,0  t/ps 

our  previous  conclusions  (ref.l).  Fig.  9.  Distance  between  Na+  ion  and 

HMPT  phosphorus  (above)  and  running 
hydration  number  of  ion  (below)  .time. 
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CONCLUSIONS 

Molecular  dynamics  simulations  of  a  ternary  system  consisting  of  an  ion,  an 
organic  molecule  and  water  quantitatively  shows  in  what  way  both  the  sodium 
ion  and  the  HMPT  molecule  influence  their  water  arrangement.  This  information 
can  be  very  useful  to  answer  to  one  of  the  fundamental  questions  of  physical 
chemistry  of  multicomponent  solutions:  how  are  solute  molecules  of  different 
types  influencing  the  solvation  of  one  another. 

It  was  confirmed  that  the  preferential  solvation  of  Na+  by  HMPT  can  occur  in 
the  water-HMPT  mixed  solvent. 

Patent  correlation  was  found  between  diffusive  displacements  of  the  sodium 
ion  and  fluctuations  of  its  coordination  number.  Hence  we  can  study  a  link 
between  the  ionic  mobility  and  the  structure  of  the  first  solvation  shell  of 
this  ion. 
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SUMMARY 

The  different  mechanisms  concerning  the  phase  transitions  for  molecular 
crystals  are  reviewed.  In  order  to  gain  more  insight  in  these  mechanisms,  we 
have  calculated  with  an  atom-atom  potential  method  possible  crystallographic 
paths  for  second  and  first  order  transitions. 

1. INTRODUCTION 

The  ability  of  a  substance  to  crystallize  in  different  crystal  forms  gives 
rise  to  polymorphism.  A  variety  of  solids  transform. from  one  crystal  structure 
to  another  and  the  subject  of  phase  transitions  has  attracted  special  interest 
in  recent  years  because  it  serves  in  other  branches  of  science,  as  crystallo¬ 
graphy,  geology,  metallurgy,  pharmacy  etc  ...  (ref.l).  We  shall  focus  here  on 
organic  molecules  where  polymorphism  is  fairly  common.  In  organic  crystal 
mainly  Van  der  Waals  forces  and  hydrogen  bonds  occur.  These  type  of  solids  are 
called  molecular  crystals.  A  transformation  is  designated  as  enantiotropic  if 
the  change  from  one  polymorphic  modification  to  another  can  be  reversed  in 
the  solid  state  as  the  temperature  or  pressure  is  varied.  If  a  change  between 
two  forms  is  irreversible  in  the  solid  state  the  forms  are  monotropic  to  each 
other.  If  the  higher  melting  form  has  the  lower  heat  of  fusion  the  two  forms 
are  usually  enantiotropic  otherwise  they  are  monotropic. 

The  transition  of  a  material  from  one  crystalline  phase  to  another 
involves  displacements  of  the  molecules.  These  dynamic  properties  have 
currently  been  studied  by  a  wide  variety  of  experimental  methods  such  as 
crystallography,  N.M.R.,  second  moment  measurement,  neutron  scattering  and  far 
infra-red  and  Raman  spectroscopies.  The  classification  of  phase  transitions 
into  various  types  has  been  prompted  by  the  expectation  that  classification 
would  facilitate  an  understanding  of  the  mechanism  of  the  microscopic  nature 
of  the  phase  transitions.  The  various  classifications  have  been  discussed  by 
several  authors  (ref. 2).  It  is  well  known  that  there  are  two  kinds  of  phase 
transitions:  the  first  kind  called  first  order,  in  which  energy,  volume  and 
crystal  structure  change  discontinuously;  the  second  kind  called  second  order 
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in  which  energy  and  volume  change  continuously  but  the  temperature  derivatives 
of  these  quantities  have  singularities.  Mechanistically,  phase  transformations 
classified  as  second  order  are  generally  associated  whith  cooperative 
mechanisms  and  displacive  transitions.  The  mechanisms  of  such  phase 
transitions  can  be  explained  in  terms  of  "soft"  modes  of  lattice  vibration. 
The  agent  of  cooperative  molecular  reorientation  is  a  strongly  anharmonic 
lattice  vibration.  When  the  structure  of  both  phases  are  known,  it  is  possible 
to  deduce  the  mechanism  by  finding  a  suitable  soft  mode  and  to  devise  an 
experimental  method  for  detecting  the  softening  as  the  transition  point  is 
reached.  However  only  a  small  number  of  phase  transitions  have  a  soft  mode 
behaviour.  Chloranil  (ref. 3),  malonitrile  (ref. 4),  N-nitrodimethylamine 
(ref. 5)  exhibit  soft  mode  behaviours  confirmed  by  experimental  measurements. 

Mnyukh  and  coworkers  (ref. 6)  deny  the  existence  of  any  but  first  order 
phase  transitions.  They  point  out  that  differences  in  the  phase  transition 
behaviour  can  be  the  result  of  experimental  inadequacies.  In  this  view,  a 
polymorphic  transition  occurs  by  way  of  growth  of  well  shaped  crystals  in  the 
parental  crystal  medium.  The  necessary  condition  for  polymorphic  transition  is 
the  presence  in  the  crystal  of  lattice  imperfections  called  nucleation  centers 
of  a  definite  quality  and  in  a  sufficient  concentration.  These  lattice  defects 
may  be  smaller  than  1  micron  and  therefore  may  not  always  be  visible  under  the 
optical  microscope. 


2.  THE  ATOM-ATOM  POTENTIAL  METHOD 


A  molecular  crystal  is  characterized  by  the  occurence  of  molecules  that 
keep  their  individual  characteristics  and  properties.  This  is  a  consequence 
of  the  fact  that  the  molecules  in  a  crystal  are  bound  together  by  forces  that 
are  very  weak  if  compared  to  the  intermolecular  forces.  As  a  consequence,  the 
external  and  internal  contributions  to  the  crystal  potential  energy  can  be 
separated.  The  molecule-molecule  interaction  is  split  into  the  sum  of 
different  independant  contributions,  which  are  the  attractive  (dispersion) 
repulsive,  electrostatic  and  polarization. 


E  =  E  + 
rep 


E, .  +  E  .  +  E  . 

disp  el  pol 


All  calculations  involve  the  pairwise  evolution  of  nonbonded  interactions 
between  atoms  composing  the  molecules  in  the  crystal.  Among  the  most 
successful  potential  functions  describing  such  interactions  is  the  sixth  power 
function  which  consists  of  a  repulsive  exponential  part  and  an  attractive 
portion 
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E=Sa  exp(  -B  R..)  -  C  R. .”6  +  S  £  qi  qi  -  1/2  £  £  a.(e.)2 

lj  ij  .  .  -r — L  ..ii 

i  i  J  J  ii  R. .  ii 

j  J  ij  J 

where  A,  B  and  C  are  empirical  parameters,  R  is  the  distance  between  atoms  i 
and  j ,  a1  is  the  mean  polarizabilitiy  attributed  to  atom  i  and  is  the 
electric  field  created  at  the  atom  i  by  all  other  molecules.  The  atom-atom 
potential  method  derived  from  static  crystal  properties  may  also  be  used  to 
predict  the  lattice  dynamics  of  organic  crystals.  It  is  an  invaluable  tool  in 
the  interpretation  of  infrared,  Raman  and  neutron  scattering  experiments.  The 
calculations  have  achieved  success  when  applied  to  internal  libration  (ref. 7), 
reorientations  and  orientational  disorders  (ref. 8),  dislocations  (ref. 9). 
Several  reviews  and  book  have  summarizv  progress  in  this  field  in  recent 
years  (ref. 10).  These  results  encouraged  us  to  explore  whether  potential 
energy  calculation  fits  with  the  features  of  crystal  polymorphic  phase 
transformations.  Our  work  is  an  attempt  at  such  a  fitting  for  both  second 
order  and  first  order  phase  transitions. 

For  example  a  possible  crystallographic  path  for  the  polymorphic 
transformation  of  trans-trans  (Z,Z)  diacetamide  into  its  trans-cis  (E,Z)  form 
has  been  calculated  (ref.il).  This  transition  is  believed  to  be  displacive.  A 
plausible  process  for  the  phase  transformation  occuring  between  the  two  forms 
consists  of  two  types  of  molecular  movements.  One  is  moving  the  centers  of 
symmetry  of  pairs  of  molecules  of  the  Z,Z  form  and  the  other  movements  are  the 
rotation  of  the  molecule  as  a  whole  and  the  internal  rotation  of  the  acetyl 
groups.  The  calculated  barriers  along  this  path  are  low  enough  to  explain  the 
ease  with  which  diacetamide  undergoes  izomerization  in  the  crystal.  Now  we 
will  focus  on  a  fist  order  phase  transition. 

3.  P-DICHLOROBENZENE  PHASE  TRANSITIONS 

It  is  well  known  that  the  gamma  <-->  alpha  and  the  alpha  <-->  beta 
transitions  of  p-dichlorobenzene  occur  with  the  nucleation  growth  processes. 
The  gamma  <-->  alpha  transition  has  been  studied  by  employing  infrared  and 
N.Q.R.  spectroscopy  as  well  as  differential  scanning  calorimetry.  Reynolds 
(ref. 12)  has  postulated  that  a  nucleus  is  produced  by  a  series  of  fluctuations 
of  large  number  of  molecules  over  a  large  range.  The  size  of  the  critical 
nucleus  is  about  1000  molecules.  A  nucleus  reaching  this  size  is  likely  to 
continue  its  growth  to  produce  a  macroscopic  daughter  crystal.  It  should  be 
noted  that  Mnyukh  and  Reynolds  admit  the  existence  of  a  nucleation  step. 
However  Reynolds  desagrees  with  Mnyukh  and  suggest  that  the  nucleation  step  is 
often  rate  determining  and  the  crystal  growth  requires  mutual  orientation  of 
parent  and  daughter  crystals  whereas  for  Mnyukh  the  daughter  crystal  is 
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oriented  randomly  with  respect  to  the  parent  crystal.  For  Reynolds  molecular 
first  order  phase  transitions  are  fondamentally  martinsitic.  The  characteris¬ 
tics  of  a  martinsitic  transformation  are  that  it  is  an  heterogeneous 
transformation  involving  nucleation  of  a  crystal  and  a  subsequent  rapid  growth 
by  a  mechanism  involving  no  long-range  translational  diffusion.  These 
characteristics  can  be  discussed  by  the  possible  epitaxy  combinations  due  to 
the  low  symmetry  in  the  phases  and  the  presence  of  imperfections  in  the 
crystal  retards  the  growth  rates. 

In  this  context  we  wish  to  consider  in  a  first  step  the  application  of 
potential  energy  calculation  to  the  modelling  of  nucleation  for  the  gamma  < — > 
alpha  phase  transformation  of  p-dichlorobenzene.  The  gamma  and  alpha  forms  are 
reported  from  X-ray  crystallographic  determination  (see  Fig.  1)  and 
G.L. Wheeler  and  S.D. Colson  (ref.  13)  suggest  a  possible  way  for  the 
superposition  of  the  two  crystals  alpha  and  gamma. 

3.1  Superposition  of  gamma  and  alpha  cells 

According  to  Colson's  ideas,  we  take  for  ab  plane,  the  plane  containing 
the  molecule  1  (0,0,0)  and  the  molecule  2  (0,1/2, 1/2)  obtained  by  an  helical 
rotation.  Now,  cell  parameters  are  changed  and  the  gamma  and  alpha  crystals 
can  be  described  with  the  following  doubled  cells  : 

a=14.026  A  b=6.021  A  c=7.414  A  beta=102.72°  P21/c  Z=4  v=610.75A3 

a=14.664  A  b=5. 740  A  c=7.850  A  beta=111.77°  P21/a  Z=4  v=613.62A3 
These  doubled  cells  are  necessary  for  calculating  a  crystallographic  path 
(v.i.).  The  position  of  the  centers  of  mass  of  the  four  molecules  of  the  gamma 
and  alpha  forms  are  respectively  (0,0,0),  (0,1/2, 1/2),  (1/2, 1,1/2), 

(1/2, 1/2,0);  (0,  1/2,0),  (1/2, 1,0),  (0,1/2, 1/2),  (1/2, 1,1/2). 

Now  the  structure  of  the  alpha  form  can  be  regarded  as  consisting  of  piles  of 
molecules  parallel  to  the  c  direction.  These  piles  contain  either  molecules  of 
type  1  (molecules  1  and  3  for  instance  for  the  pile  (0,0, c))  or  molecules  of 
type  2  (molecules  2  and  4  for  the  pile  (1/2, 1/2, c)).  It  is  to  be  noted  that  in 
one  pile,  molecules  are  parallel,  molecules  of  type  2  being  obtained  by 
symmetry  rotation  from  molecules  of  type  1.  Furthermore,  in  each  pile 
molecules  are  separated  by  3.925  A. 

To  rearrange  in  such  piles,  we  may  assume  that  molecules  of  the  gamma 
crystal  move  in  the  be  planes  along  b.  In  such  planes,  molecules  are  arranged 
in  piles  in  which  their  centers  are  deduced  by  translation  of  7.414  A  along  c, 
but  each  pile  is  shifted  by  1/2  c  (i.e  3.707  A)  so  that  they  can  be  moved  to 
imbricate  one  another  in  order  to  form  alpha-like  piles  as  shown  in  figure  1. 
In  this  assumption,  molecules  will  build  up  new  alpha-like  piles  at  1 / 4b ,  3/4b 
and  so  forth.  Accordingly,  we  displace  the  centers  of  mass  of  the  four 


Gamma  form  P2^/c  2 =2 

a=8.624A  b=6.02lA  c=7.414A  3=127.51 


oS^-° 


Alpha  form  P2^/a  2—2 

a=14 - 664A  b=5.74A  c=3.925A  3=111-77° 


Fig.  1.  Gamma  and  alpha  forms  of  p-dichlorobenzene. 
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molecules  of  the  alpha  form  and  we  have  the  new  positions  (0, 1/4,0), 
(1/2, 3/4,0),  (0,1/4, 1/2),  (1/2, 3/4, 1/2).  Finally  we  choose  the  superposition 
by  coincidence  of  centers  of  mass  on  a  and  b  directions.  This  superposition 
will  permits  us  to  displace  and  rotate  molecules  of  the  gamma  phase  unit  cell, 
modifying  also  cells  parameters  in  order  to  build  an  alpha  phase,  wich  is 
described  in  the  same  referential  as  the  gamma  phase. 

3.2  Superpositions  of  molecules. 

The  molecules  1  and  3  of  the  unit  cell  of  the  gamma  form  are  normally 
deduced  from  one  another  by  a  translation  1/2  along  the  a  axis.  To  coincide 
the  molecules  1  and  3  of  the  gamma  form  with  the  molecules  1  and  4  of  the 
alpha  form  respectively,  we  have  to  move  in  opposite  directions,  the  molecule 
1  in  the  -b  direction  and  the  molecule  3  in  the  +b  direction.  The  same  is  true 
for  the  molecule  2  (+b  direction)  and  the  molecule  4  (-b  direction)  of  the 
gamma  form  to  coincide  with  the  molecules  3  and  2  of  the  alpha  form 
respectively.  These  movements  explain  the  need  for  doubling  the  gamma  and 
alpha  original  cells. 

To  achieve  a  complete  coincidence,  three  kinds  of  geometrical  transforma¬ 
tions  are  needed  : 

-  displacement  of  the  centers  of  mass, 

-  rotation  of  the  Cl-Cl  axis  of  the  molecules  in  order  to  make  coincide  all 
chlorine  atoms, 

-  rotation  of  the  molecular  planes  around  the  Cl-Cl  axis  to  make  coincide  all 
atoms . 

The  main  features  in  this  process  is  the  weak  rotations  (Cl-Cl  axis  and 
molecular  plane)  for  the  molecule  1  of  the  gamma  form  to  coincide  with  the 
molecule  1  of  the  alpha  form  and  consequently  for  the  molecule  4  of  the  gamma 
form  to  coincide  with  the  molecule  2  of  the  alpha  form.  At  the  reverse,  the 
molecules  2  and  3  of  the  gamma  form  have  to  perform  large  rotations  to 
coincide  with  the  molecules  3  and  4  of  the  alpha  form. 

We  have  now  to  consider  how  such  displacements,  in  parallel  with  cell 
parameters  transformations,  can  be  combined  to  achieve  the  transformation  of 
the  g.  vn=  cell  to  the  alpha  cell. 

We  propose  here  a  very  simple  way  to  perform  a  progressive  transformation 
step  by  step.  We  prepared  ten  steps  with  regular  l/10th.  modification  of  each 
geometrical  transformations  as  described  above  (displacements  and  rotations), 
volume  is  also  regularly  increased  of  l/10th  for  each  step  by  suitable  modifi¬ 
cations  of  the  cell  parameters  (V  =  abc  sin  (beta)).  An  atom-atom  potential 
calculation  with  energy  minimisation  is  then  performed  for  each  step  with  the 
following  characteristics  : 


201 


-  In  each  step,  the  volume  of  the  crystal-like  cell  remains  fixed. 
Varations  of  crystal-like  cell  parameters  are  not  allowed  during  the  course  of 
minimisation. 

-  The  six  spatial  parameters  (coordinates  and  eulerian  angles)  for  the  four 
molecules  are  allowed  to  vary  during  the  course  of  minimisation. 

-  The  central  unit  cell  is  surrounded  by  two  shelves  in  the  b  and  c  directions 
and  one  in  the  a  direction  (  the  interactions  between  420  molecules  are  thus 
taken  into  consideration). 

3*3  Method  of  calculation  and  results 

The  p-diclorobenzene  has  been  the  subject  of  a  number  of  calculations  by 
atom-atom  potential  method,  which  have  been  recently  reviewed  (ref.  10)  and 
compared  with  relative  stability  order  for  polymorphs.  Nevertheless,  the 
problem. of  parametres  for  chlorine  atom  in  these  atom-atom  potential  methods 
is  not  trivial  according  to  Reynolds  due  to  the  role  of  several  lone  pairs  of 
electrons  leading  to  anisotropic  interactions,  whereas  atom-atom  potential 
standard  formula  are  purely  isotropic.  In  this  paper,  the  formalism  of 
Claverie  (ref.  14)  has  been  used  with  the  following  set  of  parameters  for 
chlorine  atom  :  k(Cl)=l.l  R(vdw)=1.76A.  For  these  calculations,  we  have 
taken  X-ray  data  for  geometric  characteristics  and  ab-initio  STO-3G  charges. 
An  activation  energy  of  7.5  kcal/mole  (see  Fig. 2)  is  found  for  the 
gamma<-->alpha  phase  transition.  This  low  potential  barrier  agrees  with  the 
ease  of  the  phase  transition. 


Fig.  2.  Variation  of  packing  energy  along  the  pathway  simulating  the  nucleus 
formation. 


202 


CONCLUSION 

In  this  work  we  have  considered  the  nucleus  formation  as  cooperative 
displacements  of  molecules.  The  calculated  iow  energy  barrier  is  in  agreement 
with  the  ease  with  which  p-dichlorobenzene  undergoes  the  phase  transition 
between  gamma  and  alpha  forms.  Our  aim  in  the  future  is  to  gain  more  insight 
in  the  determination  of  the  critical  size  of  nucleus,  determined  by  a  balance 
between  surface  energy  of  the  nucleus  and  bulk  energy  due  to  the  formation  of 
product  phase. 
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Monte  Carlo  simulations  of  the 
Acetate-Guanidinium  ion  pair  in  water 


Stephane  Boudon  *,  Georges  Wipff 

Institut  de  Chimie ;  4,  rue  B. Pascal ,  Strasbourg  67000  France 


Abstract 

Monte  Carlo  simulations  have  been  carried  out  on  the 
guanidinium/acetate  ion  pair  in  water  at  25°C  and  1  atm.  The  free  energy 
profile  of  the  separation  of  the  two  ions  has  been  calculated  using  the 
statistical  perturbation  theory.  The  determined  potential  of  mean  force 
between  the  two  ions  set  in  the  configuration  giving  the  best  interactions 
(double-bound  C2V  configuration)  shows  three  minima  corresponding 
respectively  to  an  intimate  pair,  solvent  separated  and  "infinitely"  separated 
ion  pairs.  There  is  no  clear  preference  for  the  intimate  pair  (C— C  =  3. 3 A). 
The  intermediate  state  (C-C  =  6.3A)  displays  cooperative  binding  between 
the  ions  via  two  water  molecules  forming  a  double  hydrogen-bond  bridge. 
The  two  extreme  states  (C-C  =3.3A  and  8.7A  respectively)  are  separated  by 
a  large  energetical  barrier.  The  computed  results  provide  insights  in  the 
effects  of  the  solvent  on  an  important  biochemical  system  involving  arginine 
and  aspartic  or  glutamic  acid  residues  and  on  anchoring  sites  of  proteins. 
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Although  electrostatic  interactions  in  proteins  are  recognized  as  being  a 
major  driving  force  to  determine  their  structure  and  function  (O,  it  is  still  a 
problem  to  account  for  the  "effective"  attraction  between  charged  groups  in  a 
given  molecular  environment.  In  this  paper  we  report  a  theoretical  study  of 
the  free  energy  profile  (potential  of  mean  force)(2)  for  the 
guanidinium/acetate  ion  pair  (Gu+AcO_)  in  water  to  model  putative 
interactions  between  solvated  arginine  and  glutamic  or  aspartic  acid  residues. 
In  fact  because  peptides  involving  these  charged  groups  might  act  as  binding 
or  recognition  sites  at  the  surface  of  adhesive  proteins  (3),  we  felt  important  to 
gain  insight  into  the  solvation  effects  on  these  polar  groups.  Indeed,  whereas 
the  "gas  phase"  (Gu+AcO_)  pair,  as  computed  by  ab-initio  MO-LCAO-SCF 
calculations,  is  firmly  bound  by  about  -130  kcal/mol  (4>,  solvation  effects 
could  reduce  or  even  prevent  the  ion  pairing. 

Statistical  mechanics  simulations  were  carried  out  for  the  pair  in  water 
at  25°C  and  1  atm.  using  the  isothermal-isobaric  (N,P,T)  ensemble  (5).  The 
system  consisted  of  390  water  molecules  plus  the  ion  pair  in  a  rectangular  box 
with  periodic  conditions  (20x20x30A).  The  Monte  Carlo  Metropolis  (MC) 
procedure  was  used  <6)  modified  by  preferential  sampling  (?-8).  The  solute- 
solvent  statistics  were  enhanced  by  trying  to  move  the  solute  every  60 
configurations.  The  volume  changes  were  attempted  on  every  2500lh 
configuration  and  involved  scaling  all  the  intermolecular  separations. 
Configurations  were  generated  by  moving  a  water  molecule  or  the  solute 
(Gu+AcO')  randomly  in  all  three  Cartesian  directions  and  by  rotating  it  about 
a  randomly  chosen  axis  <9).  The  atom-atom  interactions  of  the  system  00-ii) 
were  calculated  with  a  9.0A  cutoff  The  MC  runs  involved  equilibration  over 
1()6  configurations  and  averaging  over  2.4  106  configurations  for  each  step 


of  the  simulation.  The  free  energy  profile  for  the  (Gu+AcO-)  ion  pair  is 
computed  in  the  context  of  the  statistical  perturbation  theory  (12X 

In  the  simulation,  the  ion-pair  is  maintained  in  a  C2V  symmetry  and  the 
perturbation  is  done  along  the  carbon-carbon  (C— C)  distance  of  the  ions 
running  from  3.30  to  8.70A  by  steps  of  0.15A.  Double  wide  sampling  was 
used  so  that  0.3A  could  be  covered  in  one  MC  simulation.  We  realized  thirty- 
seven  simulations  with  overlapping  windows.  At  each  step  the  change  in  free 
energy  is  evaluated  as  the  difference  in  ion-water  interactions  and  give  the 
variation  in  free  energy  of  hydration.  The  total  energy  change  is  computed  by 
adding  the  cumulative  difference  in  free  energy  to  the  gas-phase  interaction 
energy  between  the  two  ions. 

The  calculated  potentials  of  mean  force  are  displayed  in  Figure  1.  for 
forward  and  backward  simulation.  The  fluctuations  (e>)  in  the  free  energy 
changes  averaged  0.17  kcal/mol  per  step.  The  hysteresis  of  the  perturbation 
done  by  bringing  the  ions  apart  (forward  simulation)  or  together  (backward 
simulation)  is  small  and  averaged  0.84  kcal/mol.  The  endpoint  of  the  pmf 
curve  at  a  8.70A  distance  is  taken  as  reference  of  the  perturbation  energy. 

Although  the  uncertainities  in  the  results  are  difficult  to  establish 
unequivocally,  they  show  two  well-defined  minima.  A  sharp  minimum  for 
the  contact  ion  pair  occurs  at  a  C-C  distance  of  3.15k,  the  NH—OC  distance 
is  then  of  1.55 A  O3).  A  second  minimum  occurs  at  a  C-C  separation  of  6.45 A 
and  corresponds  to  a  solvent  separated  ion  pair  (Figure  2.).  The  difference  in 
energy  between  the  contact  ion  pair  and  this  solvent  separated  ion  pair  is  weak 
about  1.0  kcal/mol  O4).  The  barrier  between  the  two  states  is  calculated  to  be 
11.0  kcal/mol  O5).  A  small  barrier  (approximately  2.0  kcal/mol)  has  to  be 
overrunned  to  separate  the  ions  from  the  solvent  separated  state,  i.e.to  disrupt 
the  cooperative  binding.  A  close  analysis  of  the  various  configurations 
generated  at  6.45-6.60A  separation  shows  that  two  water  molecules  are 
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cooperatively  bound  between  the  two  ions  (Figure  2.)  and  have  significant 
reduced  mobility  compared  to  the  other  water  molecules  in  the  first  hydration 
shell. 


POTENTIALS  OF  MEAN  FORCE 


Calculated  Potential  of  Mean  Force  (kcal/mol.)  for  the  acetate/guanidinium 
ion  pair  in  water  along  the  C2V  axis 


A  typical  configuration  showing  the  cooperative  binding  of  two  water 
molecules  with  the  ions  at  a  C-C  distance  of  6.45 A 
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This  calculated  energy  profile  for  the  (Gu+AcO-)  pair  differs 
qualitatively  from  that  obtained  by  similar  computational  techniques  for 
more  "spherical"  ions  in  water  (2).  Indeed,  the  intimate  or  solvent  separated 
pairs  appear  as  local  minima  and  are  unstable  towards  dissociation.  At  a  C— C 
distance  of  8.70A  we  find  that  the  solvation  free  energy  is  still  decreasing. 
Although  some  bias  may  have  been  introduced  in  the  sampling  procedure  or 
via  the  cutoff  distance,  this  may  be  a  feature  of  this  particular  system  as  well. 

In  conclusion,  we  have  determined  the  free  energy  profile  for  the 
guanidinium/acetate  ions  which  shows  no  clear  tendency  for  ion  pairing  in 
solution.  Our  results  have  bearing  on  the  formation  of  salt  bridges  in 
proteins(16).  They  suggest  that  polar  recognition  sites  involving 
guanidinium/carboxylate  pairing  at  the  surface  of  adhesine  proteins  will  be 
most  effective  if  hydration  is  prevented  (e.g.  by  lack  of  accessibility),  or/and 
if  additional  stabilization  results  from  local  topology  of  the  electric  field. 
Although  the  C2V  arrangement  is  at  minimum  energy  in  the  gas  phase, 
(Gu+AcO")  pairing  in  proteins  can  adopt  different  arrangements  (,6>.  We  are 
investigating  actually  this  point. 
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SUMMARY 

Computer  generated  ionic  site  correlation  functions  were  applied  to  polyionic 
macromolecules  in  order  to  obtain  information  about  the  relative  probability  of 
stereospecific  interactions  between  them.  The  results  show  that  it  is  possible 
to  predict  preferred  mutual  orientations  of  polyions  and  to  estimate  the  most 
probable  path  of  approach  of  interacting  charged  particles.  The  method  of  pro¬ 
gressive  approximations  can  be  applied  step-wise  to  long-range  electrostatic 
interactions.  Application  of  the  analysis  to  highly  charged  model  systems  shows 
that  it  is  possible  to  select  geometric  correlations  between  alternative  poly¬ 
ionic  conformations  by  neglecting  most  of  the  short-range  molecular  forces. 


INTRODUCTION 

The  method  of  progressive  approximations  is  implicit  in  most  theoretical  ap¬ 
proaches  to  gain  insight  into  molecular  interactions  and  it  has  been  used  before 
for  the  estimation  of  interactions  between  biopolymers  (ref.  1).  The  problems 
encountered  in  analyzing  interactions  between  macromolecules  are  so  formidable, 
that  approximations  are  essential.  Prior  attempts  to  explain  and  predict  polyion 
interactions  involved  extensions  of  the  Debye-Hueckel  theory  for  ionic  solutions 
to  various  macromolecular  model  systems  (ref.  1)  with  the  primary  aim  being  the 
derivation  of  their  thermodynamic  parameters  in  idealized  solutions.  Various 
types  of  symmetry  were  assumed  to  permit  the  use  of  geometric  operators, in  order 
to  obtain  expressions  in  closed  form,  which  could  be  solved  by  the  methods  of 
calculation  available  at  the  time. 

Macromolecules  of  biological  interest,  however,  are  surrounded  by  such  a  va¬ 
riety  of  components  and  confined  to  numerous  types  of  cellular  compartments,  the 
structures  of  which  are  still  not  sufficiently  well  understood,  that  the  quest¬ 
ion  of  the  relevance  of  extrapolating  thermodynamic  data  obtained  from  solutions 
to  in  vivo  conditions  in  order  to  explain  their  biological  behavior  is  difficult 
to  answer.  Of  more  immediate  success  may  be  the  investigr  ,ion  of  structural  cor¬ 
relations  using  distance  geometry  independent  of  detailed  thermodynamic  consi¬ 
derations,  to  avoid  complications  at  least  in  the  initial  stages  of  the  calcula¬ 
tions.  The  advent  of  fast  computers  made  possible  such  an  alternative  approach 
to  determine  the  probability  of  macromolecular  complex  formation  (ref.  2)  and  in 
recent  years  the  crucial  importance  of  electrostatic  interactions  for  the  recog¬ 
nition  between  charged  macromolecules  is  receiving  renewed  attention  (ref.  3). 
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The  use  and  analysis  of  simple,  fly-by  ionic  sice  correlation  curves,  ob¬ 
tained  with  polyionic  model  structures  permit  insight  into  the  relative  probabi¬ 
lity  of  structural  complementarity,  which  is  of  great  importance  for  understand¬ 
ing  the  role  of  charged  biological  macromolecules  like  proteins,  nucleic  acids, 
phospholipids  etc. 

METHOD 

The  approach  using  distance  geometry  starts  by  assuming  that  the  distribution 
of  point  charges  located  on  macro-ions  is  known  and  asks  about  the  relative  pro¬ 
bability  of  fitting  these  charge  assemblies  to  each  other  or  to  other  known 
structures.  The  fitting  probabilities  are  then  used  for  the  selection  of  refer¬ 
ence  structures ,  which  in  turn  serve  for  the  generation  of  new  sets  of  model 
configurations  of  polyions  by  relaxing  more  and  more  of  the  initial  approxima¬ 
tions  made  to  represent  the  various  ionic  sites. 

In  general,  any  type  of  site  correlation  function  used  to  estimate  the 
strength  of  molecular  interactions  varies  inversely  with  distance  and  depends 
quite  naturally  on  the  type  of  interacting  sites  investigated:  van  der  Waals 
interactions  are  estimated  from  functions  of  the  type  f(l/Rij*6),  interactions 
in  the  presence  of  counterions  from  f(l/(Rij*exp(k*Rij))) ,  the  electrostatic 
interactions  from  the  strength  of  the  electostatic  fields,  i.e.  f(l/(Rij*Rij)) 
etc.,  where  Rij  are  the  interatomic  distances  between  atoms  i  and  j  and  k  the 
Debye-Hueckel  parameter  (ref.  1). 
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Fig.  1.  Plots  of  l/Rij^n  for  a  positive  charge  moving  along  the  X-axis  at  a  re¬ 
lative  distance  Y,  set  equal  to  1,  past  a  negative  charge  fixed  at  X  •  0,  Y  -  0 
are  superimposed  to  show  that  the  precision  of  site  location  does  not  depend 
very  much  on  the  value  of  the  exponent  n  -  1,2, 3, 6  or  12  for  the  5  curves. 

Fig.  1  summarizes  a  few  selected  fly-by  correlation  curves  in  order  to  show 
that  the  choice  of  the  exact  type  of  site  correlation  function  employed  for  the 
estimation  of  structural  parameters  is  not  of  primary  importance  (see  also  ref. 
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2).  The  localization  of  a  charged  site  by  a  moving  probe  does  not  depend  very 
much  on  the  actual  value  of  the  exponent  of  Rij ,  as  long  as  it  is  negative  and 
its  absolute  value  is  not  much  smaller  than  1.  The  method  of  fly-by  ionic  site 
correlation  analysis  was  developed  in  order  to  investigate  the  probability  of 
long-range  interactions  and  to  select  preferred  mutual  orientations  of  approach¬ 
ing  charged  macromolecules  (ref. 2).  For  the  present  analysis  the  exponent  of  Rij 
was  set  equal  to  -2.  To  account  for  charge  attraction  or  repulsion,  individual 
Rij  values  were  multiplied  by  Sij  equal  to  +1  or  -1,  respectively. 

As  a  first  approximation  the  analysis  is  applied  to  a  number  of  standard  geo¬ 
metric  model  structures  and  asks  whether  these  structures  have  a  propensity  to 
fit  either  to  each  other  or  to  other  structures  known  to  be  present  in  their 
vicinity.  For  instance,  any  known  protein  sequence  can  be  analyzed  for  the  pre¬ 
sence  of  "one-dimensional"  structural  complementarity  by  assuming  that  each 
amino  acid  has  associated  with  it  one  characteristic  property  and  then  investig¬ 
ating  whether  complementary  patterns  can  be  detected  in  the  sequence.  Thus,  the 
probability,  that  a  polyion  like  poly-arginine  interacts  intermolecularly  with 
itself,  is  much  lower  than  that  of  a  mixed  polymer  of  poly-arginine  and  polyglu- 
tamic  acid.  In  going  from  the  ID  approximation  to  2D  or  3D,  one  can  determine, 
whether  charged  residues,  mutually  attractive  in  ID,  are  distributed  in  a  pat¬ 
tern  such  that  their  charges  are  oriented  along  a  particular  radial  direction 
when  folded  into  an  alpha-helix  (Fig.  2)  or  along  one  particular  side  of  a  2D 
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Fig.  2.  Plot  of  (l/RijA2)*Sij  for  a  hypothetical,  alpha-helical  probe  of  poly- 
Ala  with  an  internal  section  of  R(19-27)E(28-36)  moving  past  an  identical  mol¬ 
ecule  which  is  oriented  antiparallel  to  the  probe  in  a  coordinate  system.  The 
fly-by  correlation  curve  shows  one  maximum  of  attraction  and  2  minima  of  repul¬ 
sion  and  permits  the  pairwise  alignment  of  the  identical  molecules  corresponding 
to  the  interaction  maximum.  This  alignment  determines  the  position  of  the  dyad 
axis  of  symmetry  of  the  polyionic  complex. 
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beta-sheet  model  structure  or  not.  Any  increase  in  the  probability  of  polyionic 
neutralization  would  suggest,  that  the  structures  involved  would  be  stabilized 
by  such  an  interaction  and  thus  selected  out  from  the  large  variety  of  available 
competing  configurations.  Examples  of  some  of  these  analyses  are  shown  in  Fig.  2 
to  4. 


Fig.  3.  Fly-by  correlation  curve  of  an  artificial  polyion,  polyalanine  (Ala)50 
with  blocked  ends  and  Glu  at  positions  19,26,33,40  and  Arg  at  positions  22,29, 
36, and  43,  designed  to  mimic  the  central  segment  of  histone  H4  as  alpha-helix. 
The  dotted  line,  passing  through  the  maximum  of  the  correlation  curve, indicates 
the  position  of  maximum  intermolecular  charge  neutralization  and  locates  the 
dyad  axis  of  the  corresponding  intermolecular  dimer  with  high  precision.  Note 
that  the  distances  separating  all  peaks  are  close  to  20  A,  i.e.  a  distance  equal 
to  the  diameter  of  a  normal  B-DNA  double-helix. 

To  summarize,  a  general  approach  to  investigating  polyionic  interactions  of 
linear  charged  macromolecules  would  involve  the  following  steps: 

1.0.  Representation  of  the  primary  monomer  sequence  by  a  one-dimensional  string 
of  equi-distant  points. 

1.1.  The  ID  analysis  may  be  improved,  by  imposing  that  the  distances  separating 
the  points  correspond  to  the  known  atomic  backbone  separations  between  monomer 
groups . 

1.2.  A  further  improvement  in  the  model  would  be  the  replacement  of  the  points 
by  polygons  or  spheres  with  residue-specific  radii,  so  that  their  volumes  equal 
those  of  the  van  der  Waals  volumes  of  the  particular  monomers. 

2.0.  The  next  step  would  extend  the  compressed  information  of  the  string  model 
into  two  directions,  for  instance  by  assuming  for  a  peptide,  that  its  primary 
sequence  is  folded  into  a  beta-sheet.  In  this  case  the  properties  of  the  prim¬ 
ary  sequence  could  be  represented  by  points  located  at  the  positions  of  the 
corresponding  alpha-carbons  as  the  initial  2D  structural  approximation. 

2.1.  The  2D  analysis  could  be  improved,  by  imposing  that  the  points  at  the  al¬ 
pha-carbon  positions  become  polygons  with  dimensions  determined  as  above  from 
the  van  der  Waals  volumes  of  the  corresponding  residues. 

3.0.  The  next  step  would  concern  the  representation  of  primary  sequences  in  the 
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form  of  idealized  3D  configurations-  of  known  dimensions.  Thus,  DNA  sequences 
may  be  folded  into  paired  or  unpaired  helices  of  particular  conformations,  in 
order  to  simulate  native  or  denatured  DNA  or  such  structures  as  B-DNA,  A-DNA, 
C-DNA  etc.  Similarly,  proteins  may  be "folded  into  ideal  alpha-helices  or  beta- 
sheets  or  beta- turns  etc.  In  general,  the  3D  analysis  should  start  with  those 
structures,  which  are  known  to  be  the  dominating  structures  for  the  case  of  in¬ 
terest.  For  instance,  there  is  little  incentive  to  study  the  behavior  of  beta- 
sheets  for  histone  proteins,  which  are  known  to  exist  to  55+-5  %  as  alpha-heli¬ 
ces  in  native  chromatin  and  do  not  show  any  evidence  of  beta-sheet  regions  in 
the  presence  of  DNA  (ref.  4-5). 

Thus,  3D  approximations  for  proteins  rich  in  alpha-helices  would  start  by 
representing  the  amino  acid  sequence  as  cylinders  of  dimensions  corresponding 
to  that  of  an  alpha-helix,  i.e.  a  radius  of  2.3  A  and  a  length  equal  to  (N-l)* 
1.47  A,  where  N  is  the  number  of  residues  in  the  sequence.  The  residue  proper¬ 
ties  would  be  concentrated  onto  points  on  the  cylinder  surface  corresponding  to 
the  locations  of  the  alpha-carbon  atoms. 

3.1.  Similar  to  the  approximations  made  for  the  ID  or  2D  models,  the  next  step 
would  involve  the  expansion  of  the  points  to  polygons  with  residue-specific 
dimensions  and  properties  spread  over  the  external  surface. 

3.2.  The  next  step  would  replace  the  polygons  by  rigid,  radially  extended  resi¬ 
dues  with  their  properties  concentrated  on  points  located  at  corresponding  ato¬ 
mic  positions. 

3.3.  The  next  step  would  replace  the  points  by  the  actual  atoms  of  the  residues 
surrounded  by  van  der  Waals  spheres. 

3.4.  The  rigidity  of  the  residues  could  be  relaxed  in  the  next  approximation 
step,  either  by  replacing  them  by  time-averaged  rotamers  or  other  models  aimed 
at  representing  their  dynamic  properties. 

Similarly,  3D  approximations  for  DNA  sequences  would  start  by  representing 
the  base  pairs  as  cylinders  of  dimensions  corresponding  to  those  of  the  B-DNA 
double-helix,  i.e.  a  radius  of  10.2  A  and  a  length  equal  to  (N-l)*3.38  A,  where 
N  is  the  number  of  basepairs.  The  base  pair  properties  would  be  concentrated  on¬ 
to  points  on  the  cylinder  surface  corresponding,  for  instance,  to  the  locations 
of  the  phosphate  atoms.  Steps  3.1  to  3.4  would  be  similar  to  those  for  proteins, 
except  that  the  base  pairs  could  be  better  simulated  by  cylinder  slices  rather 
than  linear,  radially  extended  strings. 

Many  further  steps  of  successive  approximations  may  be  required  before  a  sa¬ 
tisfactory  answer  is  found  for  the  prediction  of  biological  structures  in  the  in 
vivo  environment,  but  it  was  very  instructive  to  find,  that  already  the  initial 
steps  of  the  analysis  using  very  simple  approximations  as  outlined  above  gave 
conclusive  geometric  parameters  for  molecular  alignment.  This  observation  is  im¬ 
portant,  since  the  higher  the  relevant  information  content  of  simple  approxima¬ 
tions,  the  lower  is  the  effort  needed  to  understand  the  phenomena  under  consi¬ 
deration. 
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For  instance,  the  possibility  to  estimate  long-range  interactions  between 
protein  alpha-helices  correctly  by  considering  them  as  idealized,  rigid  cyclin- 
ders,  would  make  it  considerably  easier  to  investigate  the  behavior  of  the  re¬ 
maining  peptide  sequences.  The  possibility  of  selecting  those  peptides,  which 
bind  stereo-specifically  into  one  or  the  other  of  the  two  principal  grooves  of 
double -helical  DNA  as  suggested  (ref.  9-11) ,  would  considerably  facilitate  the 
search  for  reasonable  configurations  for  any  adjacent,  less  rigidly  bound  pep¬ 
tide  regions. 

Thus,  one  of  the  essential  features  of  the  procedure  of  successive,  step-by- 
step  approximations  of  polyion  interactions  concerns  the  analysis  of  the  rele¬ 
vance  of  the  results  to  the  problem  at  hand  after  each  of  the  analysis  steps. 
The  data  presented  in  Fig.  4  may  be  used  to  illustrate  the  initial  step  of  such 
an  analysis.  Supposing  that  it  is  possible  to  rationalize  the  effect  of  diamines 
on  the  protection  of  DNA  against  melting  by  the  very  rudimentary  model  used,  i. 
e.  consisting  of  only  2  negatively  charged  phosphate  sites  and  2  positively 
charged  amino  groups  located  in  a  vacuum,  then  this  observation,  derived  from 
the  first  step  of  the  approximation,  implies  that  simple  coulombic  interactions 
are  the  main  driving  force  of  the  observed  phenomenon.  The  analysis  predicts 
correctly,  that  the  diamine  with  5  C-atoms  (i.  e.  cadaverine)  interacts  most 
strongly  with  DNA  and  protects  it  most  efficiently  against  heat  denaturation. 
It  shows  furthermore,  that  this  prediction  is  due  to  the  correspondence  of  the 
distance  between  phosphates  along  the  backbone  of  the  B-form  DNA,  which  is  6.854 
A,  and  the  distance  between  the  2  N-atoms  of  extended  cadaverine  molecules.  Fur¬ 
thermore,  the  simulation  shows  that  selectivity  between  diamines  is  large  enough 
to  differentiate  between  diamines,  which  differ  by  only  one  C-atom,  which  is 
equivalent  to  a  difference  in  the  N-N  distance  of  only  about  0.88  A,  if  it  is 
assumed  that  the  diamines  are  in  their  extended  configuration.  That  this  is  a 
very  plausible  assumption  is  confirmed  by  the  drop  in  DNA  protection  by  diamines 
with  more  than  5  C-atoms,  which  shows  that  the  amino  groups  remain  maximally 
separated  and  that  the  carbon-skeleton  does  not  bend  significantly  to  generate  a 
conformation  which  would  bring  the  two  amino  groups  close  enough  to  each  other, 
so  that  their  separation  matches  that  of  the  phosphates  of  the  DNA  backbone. 

That  the  data  can  be  simulated  at  such  a  low  level  of  sophistication,  i.e. 
without  taking  into  account  solvent  and  counterion  effects ,  partial  charges  or 
charge  fluctuations  etc.,  may  be  due  to  fortuitous  averaging  or  compensating 
effects  or  it  may  indicate  that  these  parameters  are  of  minor  importance.  Only 
additional  model  experiments  specifically  aimed  at  resolving  such  questions  will 
show,  which  of  the  possible  answers  are  relevant. 

A  more  detailed  investigation  could  be  aimed  at  simulating  the  effects  of  DNA 
base  composition  and/or  base  sequence  on  the  melting  data  in  the  presence  of 
diamines  or  at  trying  to  understand  in  detail,  why  solvent  and  partial  charges 
do  not  need  to  be  taken  into  account  in  order  to  derive  basic  structural  corre¬ 
lations  about  overall  molecular  complementarity  between  charged  molecules.  The 
effect  of  varying  some  of  the  critical  parameters,  like  distance  of  closest 
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approach  between  molecules,  the  value  of  the  exponent  used  for  the  distance  cor¬ 
relation,  the  effect  of  modifying  the  values  of  +1  and  -1  assigned  to  the  point 
charges  etc.,  used  for  the  calculation  of  the  fly-by  correlation  curves  may  also 
be  instructive  and  may  highlight  some  of  the  additional  applications  of  the 
method. 

The  answer  concerning  the  relevance  of  the  results  is  important  for  the  se¬ 
lection  of  the  next  step  towards  rendering  the  approximations  more  realistic. 
With  the  accumulation  of  experience  in  the  successful  prediction  of  experimental 
data,  a  set  of  criteria  will  evolve,  which  can  in  turn  be  used  to  improve  the 
success  of  predictions  in  ever  greater  detail. 


Fig.  A.  Superposition  of  experimental  and  theoretical  data,  to  demonstrate  typ¬ 
ical  applications  of  the  methodology  described.  Curve  AA  (  — )  represents  the  re¬ 
lative  change  in  the  midpoint  melting  temperature,  Tm,  of  DNA  in  the  presence  of 
linear  diamines  of  different  C-content.  The  Y-vaiues  were  obtained  by  averaging 
the  Tm  data  over  the  number  (in  parentheses)  of  the  DNA  samples  used  (ref.  7) 
and  by  normalizing  them  to  1  for  the  maximum  average  Tm  observed.  The  standard 
error  bars  indicate  primarily  the  dependence  of  Tm  on  the  differing  GC-content 
of  the  DNA  samples  used.  The  X-values  represent  Che  number  of  diamine  C-atoms 
separating  the  terminal,  positively  charged  amino  groups.  Curves  AB  (...)  and 
AC  (---)  represents  Che  relative  change  of  the  maximum  Y-values  obtained  from 
fly-by  correlation  curves  of  various  diamines  interacting  with  a  fixed  diphos¬ 
phate  with  a  charge  separation  equal  to  Chat  of  the  backbone  of  B-DNA,i.e.  6.85A 
A.  The  diamines  were  simulated  by  positive  point  charges  separated  by  distances 
corresponding  to  those  of  the  extended  C-backbone.  Curve  AB  results  from  a  fly¬ 
by  at  Y  -  1  A  and  AC  at  Y  -  2  A, 
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CONCLUSION: 

The  results  presented  show  that  the  actual  type  of  correlation  function  used 
to  analyze  long-range,  inter-molecular  ionic  interactions  is  of  only  minor  im¬ 
portance,  because  of  their  relative  insensitivity  to  the  value  of  the  exponent 
of  the  distance  parameter  (Fig.  1).  This  in  turn  suggests,  that  the  effects  due 
to  solvent  interactions,  counterions  and  electric  polarization  are  dominated  by 
long-range  electrostatic  interactions  as  far  as  overall  conformational  selection 
is  concerned,  when  the  molecules,  taking  part  in  the  interactions,  are  highly 
charged  species.  That  this  is  indeed  the  case  becomes  clear,  for  instance,  by 
comparing  the  melting  data  of  DNA  in  the  presence  of  various  diamines  (ref.  7) 
with  data  obtained  from  ionic  site  correlation  curves  (Fig.  4). 

The  results  show  that  fly-by  ionic  site  correlation  curves  are  very  sensitive 
to  the  location  of  charged  sites  as  well  as  to  the  amount  and  sign  of  the  char¬ 
ges  present.  This  observation  makes  possible  the  estimation  of  interaction  maxi¬ 
ma  and  minima  as  a  function  of  molecular  configuration  and  orientation  and  thus 
of  the  preferred  mutual  orientations  of  interacting  polyionic  species. The  method 
of  analysis  has  been  applied  with  encouraging  results  to  complex,  highly  charged 
macromolecules,  like  for  instance  histones,  the  basic  nuclear  proteins,  as  well 
as  acidic  polynucleotides  like  DNA,  in  order  to  test  the  self-consistency  of  the 
high- resolution  molecular  model  for  chromatin  subunits  (ref.  6),  which  is  now 
available  as  a  colored,  three-dimensional  display  by  computer  graphics  (ref.  8). 
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j  ABSTRACT: 

We  have  performed  methodological  studies  using  "High 
f  Temperature  Annealed  Molecular  Dynamics  Simulations"  (HTAMDS)  on  a 

’.  bicyclic  cryptand,  with  the  following  protocol  :  100  ps  of  molecular 

'  dynamics  at  1000°K,  followed  by  optimisation  of  the  structures  saved 

every  0.2  ps,  relaxation  of  these  structures  during  20  ps  of  molecular 
•  dynamics  at  300°K  and,  finally,  optimisation.  Four  sets  of  500  low 

f  energy  conformers  of  the  free  222  cryptand  have  been  produced 

;  starting  either  from  the  free  cryptand  or  from  the  M+/222  cryptate  with 

different  representation  of  M+. 

The  analysis  of  these  four  sets  allows  assessment  of  the  ability  of 
this  HTAMDS  technique  :  (i)  to  interconvert  experimentally  known 
conformers  starting  from  one  of  them,  (ii)  to  locate  the  energy  minima, 
(iii)  to  generate  new  conformers  of  low  energy,  and  (iv)  to  account  for 
the  average  structure  observed  on  the  NMR  time  scale.  In  view  of  the 
ionophoric  behavior  of  the  cryptand,  structures  are  analyzed  in  terms  of 
the  "in/out"  orientation  of  the  binding  sites. 

It  is  found  that  the  annealed  simulations  on  the  free  molecule, 
although  sampling  largely  the  conformational  space,  do  not  give 
structures  adequate  for  cation  inclusion.  They  generate  the  lowest 
energy  structure  known  experimentally  and  other  new  closely  related 
ones.  Inclusion  of  cation  in  the  simulation  (either  as  a  purely 
electrostatic  "driver",  or  as  a  charged  sphere)  leads  to  conformations 
found  in  several  complexes. 

In  terms  of  "receptor  design"  it  is  thus  essential  to  consider 
explicitely  the  complex  formed  with  the  host  in  order  to  find 
conformations  suitable  for  binding.  Conversely,  in  the  field  of  "drug 
design",  conformations  of  the  drug  recognized  by  its  receptor  may  not  be 
found  by  HTAMDS  if  the  receptor  (generally  of  unknown  structure)  is 
not  taken  into  account. 


f  To  be  published  in  J.  Comput.  Chem.  ,1990,  ii,  000. 
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SUMMARY 

The  structure  of  the  phthalocyanine  dimer  is  determined  by  theoretical 
calculations  utilizing  an  atomic  pair  potential  function  (R-l-4-6-12  type) 
proposed  by  S.  Fraga.  A  face-to-face,  slipped  structure  is  reported  for  the 
dimer.  Structural  and  oxidation  state  effects  on  the  association  energy  are 
discussed.  Two  clear  adventages  appear  when  oxidation  state  effects  are  used 
for  describing  the  phthalocyanine  dimer:  1)  The  association  energy  is 
increased.  2)  The  interaction  energy  is  less  sensitive  to  the  effect  of  the 
geometrical  parameters  (distance  between  the  molecular  planes  and  rotation 
angle). 


INTRODUCTION 

Aggregation  is  a  well-known  phenomenon  in  phthalocyanine  chemistry. 
Interactions  can  occur  between  adjacent  phthalocyanine  rings  both  in  organic 
and  aqueous  phase,  resulting  in  coupling  between  the  electronic  states  of  two, 
or  more,  phthalocyanine  units  (ref.  1). 

Phthalocyanine  compounds,  metallophthalocyanines,  are  known  to  present 
semiconducting  properties.  When  metallophthalocyanines  or  metal -free 
phthalocyanine  are  partially  oxidized  with  a  halogen  (typically  Ig),  they 
become  conducting  while  simultaneously  adopting  a  face-to-face  stacking  (ref. 
2). 

METHOD 

A  theoretical  study  on  the  intermolecular  interaction  between  two 
phthalocyanine  molecules  has  been  carried  out  by  using  an  atom-to-atom  pair 
potential  formulation  proposed  by  S.  Fraga  (ref.  3-4).  The  potential  is 
defined  as: 

Eij  ■  1389-4168  Wo 

-  694.70838  (f,.,,,  4  f 

-  1516.0732  ft.1fJ.J/(V,/»i)1/2  *  W/'H 

+  4.184  c.c./rJ?  (1) 

I  J  •  J 

where  q  (atomic  net  charge),  f  (fitting  optimized  factor),  a  (atomic 
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ir.g  direction.  Figure  a  (b,  c)  corresnonds  tn  tho  ‘  / h  a?1s.”es  on  the  view- 
y  v  •  corresponas  to  the  a  (b,  c)  minimum  in  Table  1. 
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polarizability),  n  (atomic  effective  number  of  electrons),  and  c  (overlap 
coefficient)  are  assigned  to  each  (a,b)  class  of  (i,j)  atoms  belonging  to  the 
(A,B)  molecule  and  according  with  a  classification  given  by  Clementi  (ref.  5). 

DEVELOPMENT  OF  THE  POTENTIAL  AND  NEW  IMPLEMENTATIONS 

An  empirical  correction  of  the  dispersion  energy  term  is  included  by  using  a 

new  R"8  term  (ref.  6)  that  is  added  to  the  original  potential  like  a  second  R~8 

term  (ref.  7-9). 

-4  -6 

The  R  and  the  two  R  terms  are  damoed  by  using  a  formula  by  Douketis  et 
al .  (ref.  10).  This  formulation  is  given: 

EDAMP(R)  =  E(R)  f(R)  g(R,n)  (2) 

The  g  function  is  given  by: 

g(R,n)  =  U  -  exp  (-2.1  R/n  -  0.109  R2/n1/2)}n  (3) 

and  corrects  for  the  overlap  effects.  The  values  of  n  in  eq.  (2)  are  4  and  6 
-4  -6 

for  the  R  and  R  terms,  respectively.  The  f  function  is  given: 

f(R)  =  1  -  R1,68  exp  (-0.73  R)  (4) 

and  corrects  for  the  exchange  effects. 

Two  different  procedures  of  renormalization  of  the  molecular  electrostatic 
charge  have  been  implemented:  1)  Local  renormalization  of  the  electrostatic 
charge  in  a  molecular  fragment.  2)  Global  renormalization  of  the  electrostatic 
charge  through  the  overall  molecule. 

A  variable  metric  algorithm  has  been  implemented  in  order  to  optimize 
various  starting  geometries. 

RESULTS 

Stacking  face-to-face,  slipped  (see  Figure  1)  and  face-to-face  (see  Figure 

2)  configurations  of  the  phthalocyanine  dimer  have  been  revealed  as  the  most 

important  orientations  in  the  crystal  structures  of  B-phthalocyanine  (ref.  11) 

and  halogen-doped  salts  (ref.  12).  The  association  energies  and  distances  of 

these  minima  are  reported  in  Table  1. 

In  order  to  test  the  effect  of  the  oxidation  state  of  the  phthalocyanine 

molecules,  the  structures  of  the  dimer  are  optimized  with  various  oxidation 

numbers.  The  results  are  written  in  Table  2. 

A  different  parameterization  of  Fraga's  potential  has  been  also  tested.  "Ab 

init.io!'  ST0-3G  net  charges  (ref.  13)  have  been  used  in  order  to  calculate  the 
-1-4 

R  and  R  energy  components  in  eq.  (1).  Atomic  polarizabilities  have  been 

-4  -6 

also  interpolated  for  the  new  charges  in  order  to  calculate  the  R  and  R 
energy  terms  in  eq.  (1).  The  results  are  shown  in  Table  3  for  neutral  molecules 
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and  in  Table  4  for  various  oxidation  states. 
TABLE  1 


Energies  and  geometrical  parameters  of  the  phthalocyanine  dimer. 


minimum 

x  a 
type 

-E 

(kJ/mole) 

hori  zontal 
slipping  (A) 

vertical 
stacking  (8) 

rotation 
angle  (°) 

a 

FFS 

236.0 

7.8 

3.2 

0.0 

b 

FFS 

233.3 

7.9 

3.2 

93.0 

c 

FFS 

230.6 

8.0 

3.1 

0.0 

d 

FF 

5.1 

0.0 

3.4 

0.0 

e 

FF 

6.7 

0.0 

3.4 

10.0 

f 

FF 

3.9 

0.0 

3.4 

90.0 

aSymbols  employed:  FFS:  Face-to-Face,  Slipped;  FF:  Face-to-Face. 


TABLE  2 

Association  energies  (-E,  kJ/mole)  for  various  oxidation  numbers 


of  the  phthalocyanine  dimer. 


oxidation 

states  of  molecules  A;B 

minimum 

type3 

0;0 

1/3; 1/3 

0;1 

a 

FFS 

236.0 

243.6 

276.1 

b 

FFS 

233.3 

240.2 

273.6 

c 

FFS 

230.6 

236.9 

268.4 

d 

FF 

5.1 

55.1 

_* 

e 

FF 

6.7 

57.3 

117.5 

f 

FF 

3.9 

54.1 

116.2 

3See  fotnote  of  Table  1. 

The  trial  geometry  of  d  minimum  optimizes  into  the  final 
geometry  of  the  e  minimum. 


DISCUSSION 

Two  basic  stacking  structures  (face-to-face,  slipped  a-c  minima  and  face-to- 
face  d-f  minima)  are  important  in  order  to  describe  the  phthalocyanine  dimer 
(see  Table  1).  The  stabilities  of  the  slipped  a-c  minima  are  greater  with 
difference.  The  horizontal  slipping  effects  reduce  the  vertical  distance 
between  the  molecular  planes  in  0.2  8.  Changes  in  the  rotational  angle  yield 
to  a  small  effect  on  the  association  energies. 

However,  the  effect  of  the  oxidation  state  is  very  significative  and,  when 
the  molecular  charges  of  the  molecules  are  increased,  the  association  energies 
are  highly  intensified  (see  Table  2).  In  turn,  the  less  stable  face-to-face 
structures  are,  specially,  highly  stabilized  and  the  difference  with  the  most 
stable  minimum  highly  decreases. 
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TABLE  3 

Energies  and  geometrical  parameters  cf  the  phthalocyanine  dimer. 


ST0-3G  net  charges  are  used  in  eq.  (1). 


minimum 

type3 

-E 

(kJ/mole) 

horizontal 
slipping  (8) 

vertical 
stacking  (A) 

rotation 
angle  (°) 

a 

FFS 

_* 

_ 

- 

b 

FFS 

T 

- 

- 

- 

c 

FFS 

_* 

- 

- 

- 

d 

FF 

381.2 

0.0 

3.3 

0.0 

e 

FF 

- 

- 

- 

f 

FF 

388.3 

0.0 

3.2 

90.0 

aSee  footnote  of  Table  1. 

*The  trial  geometries  of  a,  c,  and  e  minima  optimize  into  the 
final  geometry  of  the  d  minimum. 

TThe  trial  geometry  of  the  b  minimum  optimizes  into  the  final 
geometry  of  the  f  minimum. 


TABLE  4 


Association  energies  (-E,  kJ/mole)  for  various  oxidation 
of  the  phthalocyanine  dimer.  ST0-3G  net  charges  are  used 

numbers 

in  eq.  (1). 

oxidation 

states  of  molecules  A;B 

minimum 

type3 

0;0 

1/3 ;  1/3 

0;1 

a 

FFS 

_* 

b 

FFS 

_  + 

_  + 

J 

c 

FFS 

_* 

-* 

d 

FF 

381.2 

365.9 

399.8 

e 

FF 

_* 

_  ★ 

f 

FF 

388.3 

373.3 

407.4 

aSee  footnote  of  Table  1. 

*The  trial  geometries  of  a,  c,  and  e  minima  optimize  into  the  final 
geometry  of  the  d  minimum. 

+The  trial  geometry  of  the  b  minimum  optimizes  into  the  final 
geometry  of  the  f  minimum. 


No  improvement  has  been  obtained  with  the  use  of  "ab  initio"  ST0-3G  net 
charges  in  eq.  (1).  No  face-to-face,  slipped  structure  is  reported  in  Table  3 
with  the  employment  of  this  parameterization.  Only  two  stacked  face-to-face  (d 
and  f)  minima  are.  described  in  Table  3.  The  effect  of  the  oxidation  state  is 
smoothed  when  ST0-3G  net  charges  are  used  (see  Table  4). 

A  multipolar  analysis  of  the  electrostatic  properties  of  the  phthalocyanine 
molecule  is  interesting  at  this  point.  ST0-3G  net  charges  provide  a  too  low 
quadrupole  moment  (57.6  atomic  units)  when  compared  with  the  standard  one 
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{358.4  a.  u.).  A  low  quadruple  moment  favours  face-to-face  structures  while  a 
high  quadruple  moment  favours  face-to-face,  slipped  minima,  in  better 
agreement  with  the  crystal  structure  of  the  B-phthalocyanine  (ref.  11).  This 
effect  has  been  previously  described  for  the  benzene,  s-tetrazine,  and  their 
mixed  dimer  (ref.  14). 

Fraga's  atom-atom  pair  potential  implemented  into  the  AMYR  program  (ref.  3- 
4)  allows  for  the  minimization  of  the  association  energy  considering  the  six 
intermolecular  degrees  of  freedom  (three  translational  and  three  rotational)  of 
molecule  B  as  a  whole.  Three  derivative  methods  are  implemented  to  deal  with 
the  geometrical  optimization:  gradient  steppest-descent,  rank  I  variable  metric 
Davidon  (ref.  15),  and  rank  II  variable  metric  Broyden-Fletcher-Goldfarb-Shanno 
(BFGS)  (ref.  16)  minimization  methods. 

Thus,  Fraga’s  potential  is  a  good  method  for  evaluating  analytical  energies 
and  gradients  given  the  small  number  of  degrees  of  freedom  involved.  It  should 
be  noted  that  a  double  addition  of  the  terms  in  eq.  (1)  is  needed,  and  the  size 
of  this  addition  is  formally  proportional  to  the  square  of  the  number  of  atoms 
in  the  molecules;  so,  a  great  interest  exists  for  incorporating  vectorization 
into  this  pair-potential  algorithm  (ref.  17). 

The  basic  approach  to  exploting  vector  processors  is  to  calculate  Fraga's 
potential  interactions  in  a  double  loop  over  atoms  of  molecules  A  and  B,  storing 
in  square  matrices  information  about  the  potential  such  as  the  inverse  powers 
of  the  distance  and  the  specific  energy  terms  in  eq.  (1).  This  is  possible 
because  of  the  independence  of  the  atom-atom  interaction  calculations.  Also, 
the  analytical  gradient  of  Fraga's  potential  has  the  same  properties  and  atom- 
atom  interaction  independence. 

CONCLUSION 

Two  clear  adventages  appear  when  oxidation  state  effects  are  used  for 
describing  the  phthalocyanine  dimer;  1)  The  association  energy  is  increased.  2) 
The  interaction  energy  is  less  sensitive  to  the  effect  of  the  geometrical 
parameters  (distance  between  the  molecular  planes  and  rotation  angle). 

The  selection  of  atomic  net  charges  in  the  parameterization  of  Fraga's 
potential  is  a  crucial  task.  A  careful  attention  must  be  paid  in  testing  that 
the  electrostatic  multipole  moments  of  the  molecule  be  quite  well  reproduced. 

This  work  will  be  continued  with  the  study  of  the  growth  mechanism  of 
phthalocyanine  clusters  and  B-cristalline  phthalocyanine.  The  growth  mechanism 
of  the  phthalocyanine  clusters  is  simulated  by  building  molecular  stackings 
with  a  number  of  units  from  1  to  10  in  the  B-crystal  structure.  The  geometrical 
arrangement  between  the  stackings  has  to  be  optimized  (ref.  18). 
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SUMMARY 

Structures  of  flavonoids  given  by  experimental  X-Ray  data  have  been  compared  with 
calculations  from  various  quantum  chemistry  methods. 

INTRODUCTION 

We  have  tried  to  rely  geometric  results  obtained  from  X-ray  data  with  calculated  predic¬ 
tions  using  MNDO(ref.  l),MINDO/3(ref.  2)  and  a  more  recent  AMI  (ref.  3)  methods  for  a 
few  molecules  listed  in  TABLE  1.  We  report  here  the  preliminary  results  of  this  study. 
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TABLE  1 :  Entries  of  various  molecules  studied 


y 

Flavonoi'd 

R1 

R2 

R3 

R4 

R5 

\ 

i 

1 

2'-methoxy 

(ref.  4c) 

H 

H 

OCH3 

H 

H 

2 

3-methoxy 

(ref.  4a) 

H 

H 

H 

H 

OCH3 

i ' 

3 

2',6'-dimethoxy 

(ref.  4b) 

H 

H 

OCH3 

OCH3 

H 

r 

4 

5-hydroxy-7-methoxy 

(ref.  5) 

OH 

OCH3 

H 

H 

H 

L 

5 

3-hydroxy 

(ref.  6) 

H 

H 

H 

H 

OH 

» 


1 


\ 


|" 


i 


^  . 
P  * 


RESULTS  AND  DISCUSSIONS 

Results  in  TABLE  2  and  TABLE  3  are  obtained  when  starting  from  experimental  values  with 
optimization  of  the  torsion  angle  and  intercycle  distance  and  free  rotation  of  methoxy 
groups. 

TABLE  2 :  Comparison  of  torsion  angle  by  the  different  methods 


Experimental  X-Ray 

AMI 

MNDO 

MINDO/3 

1 

-2°.7 

19°.7 

-85°.4 

121°.2 

2 

37°,2 

37°.0 

103°.8 

- 

3 

70°.7 

50°.6 

102°.2 

92°.6 

4 

21°.8 

29°.2 

79° 

77°.6 

5 

5°.5 

25°.  1 

78° 

- 

j, 

•> 


231 


TABLE  3  :  Comparison  of  distance  of  the  intercycle  bond  by  the  different  methods 


Experimental  X-Ray 

AMI 

MNDO 

MINDO/3 

1 

1.475A° 

1.460A° 

1.492A0 

1.504A® 

2 

1.462A0 

1.466A® 

1.492A° 

- 

3 

1.478A° 

1.466A° 

1.502A° 

1.51 3  A° 

4 

1.485A0 

1.470A0 

1.502A® 

1.520A® 

5 

1.474A0 

1.458A0 

_ 

_ 

MNDO  and  MINDO/3  methods  overestimate  the  stability  of  the  perpendicular  conformation 
even  without  substitution  on  positions  2'  an  6'  ( entry  4 ) . 

In  each  case,  higher  values  are  obtained  for  torsion  angles  and  intercycle  bonds . 

Using  AMI  program  conducts  to  satisfactory  agreement  for  bond  lenghts  since  errors  are  less 
than  0.02A0 .  With  regard  to  torsion  angles  we  get  the  same  value  for  the  3-methoxy  flavanol  and 
a  very  small  difference  for  the  5-hydroxy-7-methoxyflavone.  There  is  a  difference  of  20°  for  the 
three  other  cases  ( entries  1, 3  and  5) . 

In  crystalline  state,  entries  3  and  5  involve  intermolecular  hydrogen  bonds  and  entry  1  two 
intra  molecular  hydrogen  bonds  which  in  the  later  case  constrain  the  two  parts  of  the  molecule  to 
be  planar .  Such  hydrogen  bonds  are  a  possible  explanation  to  the  observed  deviations . 

A  study  of  reaction  profile  shows  very  little  variation  of  heat  of  formation  ( 0.1  Kcal/mole ) 
when  angle  of  intercycle  torsion  varied  from  40°  to  140° ,  the  intercycle  distance  being  optimized 
for  each  angle  value . 

Perhaps,  the  observed  deviations  have  no  physical  meaning  for  a  gaseous  molecule,  because 
there  is  free  rotation  of  the  phenyl  ring  between  40°  and  140°. 

No  doubt,  a  new  study  by  MNDO  with  new  parameters  of  J.J.P.Stewart  ( ref.  8  -  9 )  would 


give  similar  results  as  by  AMI. 


CONCLUSION 


Compared  to  MNDO  and  MINDO/3,  AMI  gives  better  results  in  the  present  study , 
the  nuclear  repulsions  being  overestimated  in  MNDO  and  in  MINDO/3  . 

Discrepancy  between  X-Ray  results  and  calculated  geometries  using  AMI  may  be  due 
to  the  fact  that  we  consider  simple  molecules  which  do  not  take  into  account  near  neighbouring 
interactions(Ref.  9), especially  inter  molecular  hydrogen  bonds. 
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SUMMARY.  The  possibility  of  intcrmolccular  complexes  between  Glutathione  and 
hydrogen  peroxide  has  been  investigated  in  the  framework  of  the  SIBFA  method.  As  a 
first  step,  a  study  of  some  local  minima  of  Glutathione  in  both  an  isolated  state  and 
water  has  been  performed.  The  results  arc  compared  with  available  experimental  data. 


INTRODUCTION 

Glutathione  (  GSH  =  L-y-glutamyl-L-cystcinyl-L-glycinc)  is  the  major  non  protein 
thiol  compound  present  in  cells.  Nucleophilic  and  rcductant  properties  of  the  thiol  group 
confers  to  Glutathione  a  variety  of  roles,  one  of  the  most  important  being  the  reduction 
of  hydrogen  peroxide  II202  by  GSII  mediated  by  glutathione  peroxidase  (ref.  I). 

Recently,  the  reaction  of  1 1,02  with  GSII  in  absence  of  cn/yinc,  has  been  studied  in 
in  vitro  conditions  by  Abcdin/adch  et  al.  (ref.  2).  They  have  shown  the  initial  fast  for¬ 
mation  of  a  peroxide  or  chelate  between  GSII  and  1I202  (reaction  (1))  followed  by  the 
oxidation  of  [GSI  1—11,0,]  into  [  GSSG...Il202](reae  tion  (2)) 


gsh  +  ii2o2  - 

[GSII...1I202] 

(1) 

2[GSH...i!202] 

F=i[GSSG...H202]  21 120 

(-) 

From  a  theoretical  point  of  view,  it  seemed  interesting  to  test  the  possibility  of  such 
an  1 1-bondcd  complex  formation. The  present  work  is  a  preliminary  study  of  this  problem 
involving  both  intra  and  intcrmolccular  calculations  without  and  with  solvent  interac¬ 
tions. 

Since  experimental  work  has  been  carried  out  in  aqueous  medium  buffered  at  pll  7.4, 
we  were  only  interested  by  the  negative  ion  [GSI 1]  (sec  Fig. I)  which  represents  the 
state  of  dissociation  of  Glutathione  at  this  pi  1  value  according  to  both  I I-NM  R  (ref.  3) 
and  “C-NMR  (ref.  4)  studies. 
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Figure  I,  Negative  Ion  Glutathione  at  pi  I  7.4 

Definition  of  the  degrees  of  rotational  freedom 

Our  work  may  be  divided  into  two  parts: 

-  A  study  of  the  conformation  of  [GS1 1]'  both  in  an  isolated  state  and  in  water. 

-  A  study  of  intcrmolccular  interactions  between  1 1202  and  [GSII]  ,  taking  into 
account  the  eventual  conformation  change  of  both  [GSII]'  and  II202.  The  three 
different  structures  of  [GSII...II,OJ  proposed  by  Abedinzadeh  ct  al.  (ref.  2)  have  been 
investigated. 

As  emphasized  in  section  IIA  ,  we  have  not  undertaken  a  complete  exploration  of  the 
entire  [GSII]"  conformational  space,  in  fact  in  this  preliminary  work,  our  purpose  was 
to  study  the  behaviour  of  a  limited  number  of  local  minimum  conformations  with  re¬ 
gards  to  [GSI  I...I  Ij()2]  I  I-bondcd  complex  formation. 
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I  METHOD 

Both  intra  and  intcrmolccular  energies  have  been  calculated  simultaneously  in  the 
framework  of  the  SI  11  FA  method  (Sum  of  Interactions  Between  Fragments  computed 
Ab-imtio).  For  more  details  concerning  this  method,  see  ref.  5.  In  therein  work,  the 
charge  distribution  of  each  fragment  has  been  obtained  from  a  multipolar  expansion 
(ref.  6)  of  ab-initio  SCF  wave  function  calculated  within  an  adapted  minimal  basis  set 
(ref.  7). 

As  an  evaluation  of  the  solvent  effect,  we  have  only  taken  into  consideration 
‘Ilydratation  water'  molecules,  i.e.  the  ones  which  arc  very  close  to  the  solute  and  thus 
interact  very  strongly  with  it.  In  order  to  estimate  the  'Ilydratation  energy'  (A/:'M,„ ),  it 
may  be  supposed  that  each  water-solute  interaction  (/:„,)  replaces  a  water-water  inter¬ 
action  J  ,  A.  being  the  number  of 'Ilydratation  watcr'molccules;  we  get: 

,v„ 

Hydra  ~  ~~b!wh w_w 
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Wc  have  used  the  value  of  /;„.„of  7.4  Kcal/mole  calculated  within  the  SIBFA  method. 
We  arc  conscious  that  AHll)drc  only  represents  a  part  (but  an  important  one)  of  the  total 
solvatation  energy  in  water,  but  such  a  study  should  give  an  eventual  insight  into  pos¬ 
sible  intramolecular  conformational  change  due  to  these  strong  water-solute  interac¬ 
tions. 


II  CONFORMATION  OF  [GSII]-  . 

In  spite  of  its  wide  biochemical  interest,  experimental  and  theoretical  data  on  the 
conformation  of  Glutathion  arc  rather  scarsc: 

-  A  crystal  structure  determination  has  been  reported  for  the  neutral  form  GSII 
(ref.  8). 

-  Among  the  few  NMR  studies  carried  on  Glutathione,  only  one  (ref.  3)  was  concerned 
by  the  geometrical  arrangement  in  water  solution  at  different  pi  I  values. 

-  1’CILO  calculations  (ref.  9)  have  been  performed  for  both  GSII  and  [GSII]  in  an 
isolated  state. 

All  these  results  arc  somewhat  sparse  and  furthermore  they  arc  not  consistent.  Rut 
such  an  outcome  principally  proceeds  on  one  hand  from  the  flexibility  of  Glutathione 
(even  when  keeping  the  two  peptide  links  planar  with  Nil  and  CO  groups  in  a  trans 
position,  it  still  remains  eleven  degrees  of  rotationnal  freedom  (see  Fig.  1))  and  on  the 
other  hand  from  the  presence  of  the  three  functional  groups  (two  acid  plus  one  amino 
groups),  the  dissociation  of  which  strongly  depends  on  the  pi  I  value.  Consequently  as 
emphasized  by  (ref.  3,9),  the  conformational  space  should  include  a  great  deal  of  local 
minima,  the  relative  importance  of  which  is  related  to  change  in  the  functional  groups 
and  to  environmental  effects. 


A  Isolated  state 

Wc  have  used  bond  lengths  and  angles  determined  in  the  crystal  structure  (ref.  8). 
Our  results  proceed  from  a  simultaneous  variation  of  the  eleven  torsional  angles  (defined 
in  Fig.  I)  through  an  optimization  process  carried  out  with  two  initial  guesses: 

-  The  S-shaped  'open'  conformation  (denoted  S)  observed  in  the  GSI I  crystal  structure 
(Fig.  2a). 

-  A  local  energy  minimum  conformation  (denoted  S')  obtained  from  a  sub-map 
H  =  I(M'j,  <I>2),  all  other  torsional  angles  being  frcc/cd  at  the  values  of  (ref.  8).  This  choice 
reflects  the  fact  that  'l'2  and  <I>2  dihedral  angles  are  important  with  regards  to  the  relative 
position  of  the  two  peptide  links  and  thus  for  the  geometrical  arrangement  of  the  central 
part  of  the  molecule,  i.c.  the  one  probably  involved  (following  ref.  2)  in  the 
[GSIL.lljOJ  interaction.  Optimization  ofS  and  S'  have  respectively  led  to  Opt,  (Fig. 
2b)  and  Opt2  (Fig.  2c)  conformations. 
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Confor¬ 

mations 

kl'-P'pvl 

— 

A  Ftl 

A/W 

A/:*. 

Opt, 

14.9 

-10.1 

-55.8 

2.6 

-54.9 

Opt2 

9.3 

-7.4 

-53.9 

2.3 

-56.1 

Opt, 

8.3 

-6.7 

-46.1 

2.3 

-40.6 

Table  1.  Variation  of  intramolecular  energy  A/t,„(and  its  components). 

Subscripts:  Repul,  Disp,  El,  Pot,  for  Repulsion, Dispersion,  1-Icctrostatic 
and  Polarisation  respectively.  (All  values  in  Kcal/molc). 


Confor¬ 

mations 

d>, 

T, 

<I>, 

t2 

Xi 

S 

275.1 

190.0 

269.1 

356.5 

219.0 

289.0 

Opt2 

172.2 

119.1 

197.8 

22.0 

305.7 

297.6 

Opt, 

193.2 

123.8 

178.0 

126.1 

300.0 

314.7 

Opt, 

196.4 

121.6 

193.7 

42.0 

291.4 

280.4 

Table  2.  Values  of  some  dihedral  angles  defining  different  Glutathione  conformations. 
The  values  calculated  for  the  five  other  angles  arc  :  y,s60'>,  yyslSO", 

X,as90®,  <I>3— 60°,  'l'j=;30<>  (291°  in  S  conformation).  (All  values  in  degrees). 


From  an  energetical  point  of  view,  our  results  clearly  show  : 

-  A  drastic  instability  of  S  conformation  with  regards  to  both  Opt,  and  Opt2  .  The 
energy  difference  we  have  calculated  between  S  and  Opt,  or  Opt2  :  A/:=55Kca!/mo!c 
mainly  proceeds  from  the  electrostatic  component  of  the  intramolecular  energy. 

-  An  almost  equal  stability  between  both  Opt,  and  Opt2  :  The  1.2  Kcal/molc  energy 
difference  we  have  obtained  is  not  very  significant. 

Front  a  geometrical  point  of  view,  in  agreement  with  PCI  LO  results  (ref.  9),  the 
optimization  of  S  (leading  to  Opt,)  does  not  change  significantly  the  central  part  of  the 
molecule  (see  Table  2);  we  have  just  noticed  a  modification  of  the  value  of  the  angle  a 
between  the  planes  containing  the  two  peptide  linkages  (92°0  and  43°6  respectively) 
while  the  geometrical  arrangement  of  the  glycyl  residue  and  of  the  glutamyl  chain  are 
rather  different,  effectively. 
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-  A  displacement  of  the  glycyl  carboxyl  group  out  of  the  plane  03N,C2  is  observed  in  S 
conformation  ( <I>,  =  275°)  but  not  in  Opt,  (d>,  =  172°),  but  in  this  latter  it  may  be 
noticed  a  torsion  of  glycyl  carboxylatc  oxygen  around  the  C-C  bond  ( =  1 19°),  thus 
in  contrast  with  the  PCILO  results  the  glycyl  residue  docs  not  adopt  a  Cs  conformation. 

-  Following  the  nomenclature  of  Fujiwara  ct  al.  (ref.  3)  the  glutamyl  chain  appears  to 
be  a  distorded  'open'  rotamcr  1-3  in  S  conformation  and  a  closed  rotamcr  11-3  in  Opt,  . 
An  insight  to  Fig.  2a,b  clearly  shows  that  neither  S  nor  Opt,  geometries  involves  any 
1 1-bond  between  the  N'Hj  or  COO'  glutamyl  groups  and  the  peptide  backbone,  but  in 
Opt,  a  weak  11-bond  (  d0  „  =  2.4A)  occurs  between  carboxylatc  group  and  cystcyl  SI1 
group.  Once  again  our  local  minimum  is  different  from  PCILO  one  which  indicates  a 
preference  for  structures  with  glutamyl  carboxylatc  in  interaction  with  the  peptide  link. 
This  will  be  discussed  later  on. 

-  Opt,  and  Opt2  mainly  differ  by  the  value  of  V1'2  angle  (S3°5  and  187°5  respectively), 
consequently,  in  Opt2  the  two  peptide  links  arc  in  a  nearly  trans-position,  the  a  angle 
(prcecdcntlv  defined)  has  significantly  varied:  224°9  instead  of  43°6.  Furthermore  Fig.2c 
indicates  a  very  weak  interaction  (d„  S  =  2.95A)  between  N1I3  glutamyl  group  and  SI1 
cystcyl  group. 


B  lit  water. 

As  a  first  step,  we  have  taken  into  account  ten  'llydatation  water'  molecules:  the 
intramolecular  conformation  of  S,  Opt,  and  Opt2  does  not  change  significantly  and  now 
hydrated  Opt,  is  stabilized  with  regards  to  both  hydrated  S  and  Opt2  conformations; 
furthermore  Opi2  appears  to  be  saturated  (in  sense  of  'I  lydratation  water'). 

Our  process  was  carried  on  with  Opt, :  the  saturation  occurs  with  eighteen  '1  lydratation 
water'  molecules;  as  a  result  of  simultaneous  inter  and  intramolecular  optimization 
processes  we  have  obtained  Opt,  (see  Fig.  3)  which  is  practically  similar  to  Opt,  from  a 
geometrical  point  of  view  (the  eleven  dihedral  angles  values  do  not  differ  by  more  than 
20°).  From  an  energetical  point  of  view  it  may  be  noticed  that  the  14.3  Kcal/Molc  loss 
in  intramolecular  energy  is  balanced  by  a  127  Kcal/Molc  hydratation  gain.  Fig.  3  also 
gives  an  illustration  of  the  position  of  the  eighteen  water  molecules  interacting  with 
Optj.  As  appears  in  this  figure,  one  water  molecule  form  an  1 1-bond  bridge  between 
Nl  Ij  and  COO  groups  of  the  glutamyl  fragment.  We  have  observed  a  similar  situation 
with  ab-initio  calculations  (within  6-31(3**  basis  set)  of  glycin  molecule  which  has  the 
same  topology  than  the  one  existing  in  the  COO  ClIjNIIj  group  of  the  glutamyl  part 
of[GSIl)  . 


Figure  3.  Glutathione  in  Opt,  conformation  with  '1  lydratation  water'  molecules. 

GSII  is  represented  by  heavy  line.  Some  molecular  distances  between  GSI1  and 
water  are  represented  by  dashed  lines. 

Ill  [GSII] . IIjOj COMI’LIiXHS 

As  an  initial  guess,  we  have  taken  the  [GSII]  conformation  (  Opt3)  optimized  in 
water  and  the  experimental  skew  form  of  1 1202  (t  =  120°).  Not  any  optimization  process 
has  been  performed  on  the  isolated  state  of  1 1202  since  it  is  well  known  that  avalaiblc 
results  on  1 1202  arc  only  obtained  in  the  framework  of  ab-initio  calculation  fulfilling  at 
least  two  criteria:  1)  The  basis  set  employed  has  to  be  augmented  by  polarization 
functions.  2)  All  geometrical  parameters  have  to  be  optimized  for  all  values  of  r  to  be 
considered  (ref.  10).  In  this  preliminary  work,  we  have  studied  the  three  [GSII]  ...II2()2 
complexes  suggested  by  Abedinzadeh  et  al.  (ref,  2)  namely  the  ones  involving  interac¬ 
tions  between  li202  and  1)  both  CO  groups:  [(Optj...l I202)<O(()]  ,  2)  both  Nil  group 
belonging  to  the  two  peptide  links:  [(0pt3...II202)NH>Nll]  ,  3)  glycyl  Nil  and  glutamyl 
CO  group:  [(Opt3...l  I202)C(>NI1].  As  a  first  step,  only  intcrmolecular  and  1I202 
intramolecular  energies  have  been  optimized.  Table  3  indicates  that  [(0pt3...lI202)MI  N11] 
seems  to  be  the  most  stable  complex. 


kjtf/l ll 

1'Dttp 

I'CTl 

(Optj  —  H202)con„ 

6.0 

-5.8 

-7.1 

-0.3 

-0.8 

-8.0 

(0pt,-II202)N1I,NH 

8.5 

-7.3 

-8.2 

-0.8 

-11.6 

(Optj  —  I  I202)co  co 

9.1 

-7.4 

-8.2 

-0.3 

-0.8 

-7.5 

(Opt*  —  I  JjOjJco.mi 

8.7 

-7.6 

-11.6 

-0.5 

-1.9 

-12.9 

Table  3.  Total  intcrmolcaiiar  interaction  energy  (and  its  different  components)  for 
some  GSH...ll202  complexes. 

Subscripts:  Rcpul,  Disp,  1U,  Pol  have  the  same  meaning  as  in  the  Table  1. 
F.CJ  stands  for  the  charge-transfer  component^  All  values  in  Kcal/molc). 


An  insight  to  Fig.  4  shows  that,  in  fact,  the  interactions  between  CO  or  Nil  peptides 
groups  should  be  very  weak  since  the  shortest  distances  thus  calculated  lies  between 
2.44 A  and  2.74 A.  In  fact,  the  stability  of  [(Opt,...I I202)]  complex  mainly  proceeds  from 
a  strong  11-bond  interaction  between  ii202  and  glycyl  COO  group  (d„..0=  1 .98 A).  It 
has  been  noticed  that  the  value  of  the  angle  r  (defining  the  conformation  of  1 1202)  docs 
not  deviate  significantly  from  120°,  the  value  obtained  for  the  minimal  conformation. 

As  a  second  step,  possible  changes  of  both  [GSH]'  and  1I202  geometries  have  been 
taken  into  account  in  the  optimization  process:  this  docs  not  affect  markedly 
[(OptJ...II202)]N|liN,|  and  [(0pt3...ll202)]coco  complexes,  but  stabilizes  the  one  involving 
glycyl  Nil  and  glutamyl  CO  groups:  [(()pt4...l l2'02)]CONM  is  thus  obtained  (see  'fable  3) 
strikingly,  as  shown  by  Fig.  4b,  the  optimization  of  [GSII]  geometry  brings  11202 
nearer  to  glycyl  COO  group  (d„.0=  1.94A),  leading  to  a  strong  1 1-bond  interaction. 
The  geometrical  parameters  defining  Opt4  conformation  arc  practically  identical  to  the 
ones  calculated  for  Opt2  (variation  of  dihedral  angles  is  less  than  20°),  and  the 
intramolecular  energy  remains  almost  unchanged  (variation  of  intramolecular  energy  is 
less  than  2Kcal/molc). 

In  this  preliminary'  work,  we  have  not  taken  into  account  the  solvent  effect,  thus  we 
cannot  decide  that  one  of  the  four  complexes  we  have  obtained  is  the  most  stable  one; 
we  may  only  conclude  that  effectively  intcrmolecular  complexes  between  Il2()2  and 
[GSIIJ"  are  possible.  Furthermore,  at  the  light  of  our  results,  CO  and  Nil  peptide 
group  arc  not  the  most  favourable  interacting  sites;  we  have  shown  up  the  involvement 
of  glycyl  COO'  group  in  the  complcxaiion  phcnomcn. 
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Figure  4.  Glutathione  -  Peroxide  hydrogen  Complexes 
a)[Optj  —  I  IjOjImi.mi-  b)[Opt4  -  I  ljOj](  OSI|. 

11/)^  is  represented  by  a  heavy  line.  Shortest  intermolccular  distances 
are  represented  by  dashed  lines. 


CONCLUSION 

In  the  light  of  the  results  of  the  calculations  reported  above,  three  main  conclusions 
may  be  drawn-up: 

I)  The  geometrical  arrangment  observed  in  the  crystal  structure  of  Glutathione  does 
not  represent  a  local  minimum  cither  in  an  isolated  state  or  in  water  solvent.  It  must  be 
emphasized  that  this  determination  only  concerns  the  neutral  form  of  Glutathione  and 
not  its  negative  ion  which  may  adopt  a  different  conformation.  This  hypothesis  is 
supported  by  POLO  calculations  which  have  shown  that  changes  in  the  functional 
groups  of  glutamic  acid  have  an  important  influence  on  the  preferred  conformation  of 
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the  molecule.  Furthermore,  the  molecular  geometry  found  in  GSH  crystal  should  mainly 
proceed  from  environmental  effects,  since,  in  this  structure,  the  GSII  molecules  are  held 
together  by  a  three  dimensional  network  of  hydrogen  bonds  involving  carboxyl  and 
amino  terminal  groups  and  peptide  nitrogen  and  oxygen  atoms. 

2)  In  the  conformational  space,  besides  closed  (gauche)  conformation  involving 
intramolecular  11-bonds  between  terminal  glutamyl  ionized  groups  and  atoms  belonging 
to  the  two  peptide  links  (like  the  III-l  rotamcr  found  in  PCILO  calculations)  it  exists 
some  closed  conformations  in  which  neither  CO  nor  Nil  peptide  groups  arc  involved 
into  internal  II-bonds.  This  result  is  not  inconsistent  with  NMR  conclusions  which 
indicate  that,  at  the  physiological  pi  I  the  gauche  population  (60%)  may  include  both 
PCILO  and  our  conformations. 

3)  Inicrmolccular  complexes  between  GSII  and  I1202  are  possible:  this  result  is  not 
without  interest  from  a  biological  point  of  view,  since  it  may  be  conceived  that  even  in 
vivo,  in  absence  of  Glutathione  peroxydase,  Glutathione  may  mask  1 I202  by  complcxing 
it. 

We  arc  conscious  that  this  work  only  represents  a  first  step  as  concerns  the  answer 
of  our  problem.  Our  results  can  be  considered  only  as  qualitative  ones.  We  have 
especially  studied  the  possibility  of  some  conformations  of  GSH  to  induce  some 
inicrmolccular  complexes  with  ll202 .  Obtaining  quantitative  results  (which  should  led 
to  association  constant  for  [GSII]  ...il202  complex)  requires  the  exploration  of  a  more 
complete  [GSI 1]'  conformational  space,  the  taking  into  account  of  the  whole  solvent 
and  the  study  of  complex  configurations  which  may  be  different  than  the  ones  proposed 
by  Abedin/adeh  ct  al  (ref.  2).  These  calculations  have  been  undertaken  at  the  present 
time. 

The  authors  acknowledge  financial  support  of  these  studies  by  the  "Groupement 
Scicntifiquc  IBM-CNRS:  Moderation  moleculairc". 
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SUMMARY 

The  interface  between  a  membrane,  modelled  by  an  ensemble  of 
COO-  groups  with  rotational  and  translational  degrees  of  freedom, 
and  liquid  water  with  solvated  ions  is  studied  by  molecular 
dynamics  (MD)  computer  simulations.  The  charged  membrane  leads  to 
a  layering  of  the  liquid.  Several  water  layers  can  be 
distinguished  with  structural  and  dynamic  properties  very 
different  from  those  found  in  the  bulk  phase. 


INTRODUCTION 

A  detailed  knowledge  of  the  microscopic  structure  and 
dynamics  in  the  interfacial  region  between  water,  or  aqueous 
solutions,  on  one  side,  and  a  biomembrane  on  the  other,  is  an 
essential  prerequisite  for  the  understanding  of  many  elementary 
biological  processes  [1]. 

In  the  present  work,  we  employ  the  molecular  dynamics  (MD) 
computer  simulation  method  in  a  first  attempt  to  study  the 
typical  phenomena  which  may  occur  at  such  an  interface  between 
pure  water  or  a  solution  and  a  membrane.  Presently  we  do  not 
attempt  to  give  a  quantitative  description,  but  rather  to 
determine  typical  features  and  trends . 


THE  MODEL  AND  DETAILS  OF  THE  SIMULATION 

The  system  studied  consisted  of  737  TIP4P  water  molecules  and 
of  60  COO-  groups  representing  the  membrane  surfaces,  one  on  each 
side  of  a  water  lamina  (see  fig.l).  The  density  of  the  headgroups 
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was  varied  between  0.042  A"^  and  0.05  A"^.  Ha+  ions  were  added  to 
the  system  until  charge  neutralization  was  reached. 


A  H20 
R  COO" 
•  No+ 


Fig.l  Model  system  (schematically) 

The  total  potential  energy  of  the  system  is  partitioned  into  the 
following  contributions : 


Vtot  =  VW-W  +  VW-S  +  VS-S  +  VW-I  +  VS-I  +  VI-I  (!) 

where  W  stands  for  a  water  molecule,  S  for  a  headgroup  in  the 
membrane  (COO"  groups)  and  I  for  a  Na+  ion.  The  TIP4P  model 
potential  [3]  is  used  for  the  intermolecular  water-water 
interaction  Vw_w.  The  other  interaction  potentials  were  developed 
accordingly  along  the  same  lines  as  sums  of  Coulomb  and  Lennard 
Jones  (12,6)  site-site  pair  potentials,  using  a  model  by 
Jorgensen  and  Gao  [4].  A  model  potential  due  to  Bounds  [5]  was 
used  for  vw-i-  The  simulations  are  carried  out  in  the  usual 
fashion  at  constant  number  of  particles,  volume  and  total  energy 
(NVE-MD) .  Full  details  of  the  model-potentials  and  of  the 
simulations  are  given  elsewhere  [6]. 


STRUCTURAL  RESULTS  OF  THE  INTERFACIAL  SYSTEM 
The  Structure  of  Aqueous  Phase 

A  strong  influence  of  the  charged  groups  of  the  membrane  on 
the  adjacent  liquid  is  to  be  expected  from  the  work  on  ionic 
hydration  [7]  and  on  charged  surfaces  [8].  In  the  interfacial 
region  three  structuring  effects,  namly  from  the  bulk  phase,  from 
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the  membrane,  and  from  the  ionic  hydration,  are  in  competition, 
leading  to  interesting  cooperative  effects  (fig.  2). 


Fig. 2  Density  profiles  for  the  water  molecule  oxygens  as  a 

function  of  their  distance  from  the  plane  of  the  carbon 
atom  for  pure  water  (a),  35  ions  (b)  and  60  ions  (c). 

It  is  seen  that  the  presence  of  the  ions  leads  to  a  decrease  of 
the  strong  density  oscillations  observed  in  pure  water.  Due  to 
the  adsorption  of  ions  in  the  immediate  vicinity  of  the  membrane 
(vide  infra),  the  first  two  layers  are  collapsed  into  one.  The 
overall  increase  in  water  density  in  the  vicinity  of  the 
membrane,  by  a  factor  of  about  two  compared  to  pure  water, 
remains  roughly  the  same  in  all  systems.  Such  density  increases 
have  been  found  experimentally  by  Joosten  (9). 

Figure  3  shows  the  density  profiles  for  the  ions.  These  were 
added  slowly  to  the  pure  water  system  by  replacing  water 
molecules  near  the  center  of  the  lamina.  About  4  to  8  ions  were 
added  at  a  time,  and  the  system  was  then  allowed  to 
reequilibrate. 
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Fig. 3  Density  profiles  for  sodium  ions  as  functions  of  the 
distance  from  the  membrane  surface. 

35  ions  (a),  60  ions  (b). 

The  interesting  feature  is  the  saturation  of  the  first  layer  of 
adsorbed  ions  at  about  12  ions  vs.  30  C00“  groups.  This 
corresponds  to  a  particularly  favorable  geometric  arrangement  of 
the  ions  with  respect  to  the  membrane  groups.  Further  addition  of 
ions  lead  to  the  formation  of  a  weakly  pronounced  second  layer 
and  finally  of  a  diffuse  distribution  throughout  the  lamina. 

Density  fluctuations  as  seen  in  figure  2  are  an  indication  of 
a  layering  of  the  water  molecules  parallel  to  the  membranes  and 
suggest  the  interpretation  of  this  phenomenon  as  an  adsorption. 
We  shall  make  use  of  this  concept  of  'adsorbed'  layers  of  water 
in  order  to  further  characterize  the  structure  of  the  water 
adjacent  to  the  membranes.  Between  these  adsorbed  layers  and  the 
bulk  phase  a  structure  broken  region  can  be  identified  [6]. 
Figure  4  shows  distribution  functions  of  the  oxygen  distances 
within  the  first  layer  of  adsorbed  water  with  and  without  ions, 
and  for  the  "bulk"  phase  (central  part  of  the  water  lamina). 


Fig. 4  Oxygen-oxygen  pair  distribution  functions 

pure  water,  first  layer  (a),  35  ions,  first  layer  (b), 

60  ions,  first  layer  (c),  60  ions,  bulk  phase  (d). 


In  the  pure  water  system,  the  ordering  of  the  water  in  the  first 
layer  is  almost  completely  determined  by  the  structure  of  the 
membrane  (a).  Due  to  the  hydration  of  ions,  with  a  typical  Na+- 
water  distance  of  2.3  A,  and  to  the  influence  of  the  ions  on  the 
membrane  itself,  this  structure  is  modified  in  the  presence  of 
ions  {b,c).  The  structure  broken  transition  region  found  between 
the  water  layers  and  the  bulk  water  in  the  presence  of  ions  is 
much  less  pronounced  than  in  the  pure  water  case. 


THE  DYNAMICS  OF  THE  MEMBRANE-LIQUID  INTERFACE 

There  are  substantial  differences  between  the  dynamical 
behavior  of  bare  model  membranes  and  such  in  contact  with  water 
or  ionic  solution  [6].  Fluid  like  motions  with  very  large 
amplitudes  and  occasional  exchanges  take  place  in  the  bare 
membrane.  These  motions  are  quite  efficiently  hindered  by  the 
adsorbed  water  molecules  and  also  by  the  ions . 
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Detailed  information  about  the  dynamics  can  be  obtained  from 
the  velocity  autocorrelation  function 


T 

cw(t)  =  1/T  J<vi(x+t)*vi(x)>  dx  (2) 

0 

and  its  Fourier  cosine  transform  (spectral  density,  fig. 5) 

CO 

avv(w)  =  JCvv(t).cos(wt).dt  (3) 

0 

The  zero  frequency  term  of  cvv(w)  is  proportional  to  the  self¬ 
diffusion  coefficient.  The  averages  in  equation  (2)  (pointed 
brakets)  have  been  carried  out  over  water  layers  parallel  to  the 
membrane . 
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Fig. 5  Spectral  densities  obtained  for  the  hindered  translations 
of  the  water  molecules  in  various  layers.  The  self¬ 
diffusion  coefficients,  determined  separately  for  motions 
perpendicular  and  parallel  to  the  surface,  are  also  given. 
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The  immobilization  of  the  water  by  the  membrane  leads  to  a 
decrease  of  the  self  diffusion  by  a  factor  of  about  10.  A  similar 
trend  has  been  found  in  simulations  of  smectic  liquid  crystals 
[10].  Even  in  the  center  of  the  lamina,  where  the  structure 
strongly  resembles  the  one  of  pure  water,  the  self  diffusion 
remains  somewhat  anisotropic.  Except  for  this  feature,  the 
spectral  density  with  its  characteristic  peak  at  50  cm-1  and  the 
pronounced  shoulder  around  200  cm-1  is  very  similar  to  the  one 
for  pure  water  [11].  Moving  closer  to  the  membrane,  it  is 
continously  altered  and  shifted  to  higher  frequencies .  A  detailed 
analysis  reveals  that  the  peak  at  lower  frequency  in  the  adsorbed 
layer  is  due  mostly  to  motions  of  the  water  molecules  parallel  to 
the  membrane  surface.  This  peak  has  been  found  to  be  sensitive  to 
the  density  of  the  headgroups  in  the  membrane. 

CONCLUSION 

We  have  investigated  the  interface  aqueous  solution  /  charged 
membrane  with  respect  to  the  static  and  dynamic  properties  of  the 
liquid  and  of  the  membrane.  It  was  attempted  to  make  the  system 
sufficiently  large  to  include  the  complete  transition  region 
between  bulk  water  and  water  adsorbed  to  the  membrane.  The 
density  of  the  water  is  increased  by  a  factor  of  about  2  next  to 
the  membrane.  The  overall  structural  influence  of  the  membrane 
extends  about  8  to  9  A  into  the  liquid.  This  range  and  the 
details  of  the  ordering  depend  on  the  surface  density  of  the 
membrane  and  on  the  ionic  concentration.  The  single  molecule 
dynamics  of  the  water  reveals  a  longer  range  of  influence.  For 
the  hindered  translation,  the  transition  between  .ordered  layers 
and  bulk  is  continous,  in  contrast  with  the  finding  for  the 
structure . 
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SUMMARY.  „  .  .... 

The  purpose  of  this  study  is  to  determine  of  the  conformation  of  the  zwittcnomc  polar 
head  of  Amphotericin  B  in  both  isolated  and  hydrated  states. 


INTRODUCTION 

There  has  been,  in  recent  years,  a  burgeoning  interest  concerning  the  biochemical 
mode  of  action  of  polyene  macrolidc  antibiotics  which  appear  to  be  promising  as 
antifungal  agents .  One  member  of  this  group.  Amphotericin  B  (AMB,  fig.  1),  is  most 
commonly  used  in  therapy  (ref.  I). 


Figure  1.  Amphotericin  B  . 

Definition  of  the  fragments  and  the  degrees  of  rotational  freedom 
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These  antibiotics  reversibly  induce  permeability  only  to  monovalent  cations  in  both 
micro-organisms  and  animal  cells  (ref.  2,3).  The  generally  accepted  mechanism  of 
ionophoric  action  of  these  molecules  is  based  on  the  assumption  that  they  form  com¬ 
plexes  with  sterols,  thus  there  is  a  competition  between  these  molecules  and 
phospholipids  in  the  membrane  (ref.  4). 

It  was  shown  that  the  polyene  macrolides  entering  the  membrane  align  themselves  in 
a  position  parallel  to  sterols  and  phospholipids  molecules  (ref.  5)  with  their  polar  head 
at  the  mcmbranc/watcr  interface.  In  this  position,  the  polar  head  functional  groups 
(COO'and  NllJ)  may  interact  by  11-bonding  with  /l-Oll  of  the  sterol.  Hydrophobic 
interactions  between  the  steroid  nucleus  (and  its  alkyl  tail)  and  the  rigid  polyenic  part 
of  the  macrocyclic  ring  take  place.  Both  interactions  contribute  to  the  stability  of  the 
sterol-polyene  complex  (ref.  6). 

Recently,  the  permeability  induced  by  AMB  derivatives  in  large  unilamellar  lipidic 
vesicles  containing  various  sterols  has  been  studied  using  proton-cation  exchange 
method  and  UP-NMR  spectroscopy  by  M.llcrve  ct  al  (ref.  7).  From  the  results  thus 
obtained,  two  groups  of  polyene  antibiotics  have  been  distinguished  according  to  their 
ionophoric  properties.  It  has  been  noticed  that  polyenes  of  group  1  have  a  free  ionizablc 
carboxyl  group  but  not  those  of  group  II.  M.llcrve  ct  al.(rcf.  7)  have  formulated  an 
hypothesis  on  the  role  of  polyene-sterol  interactions  in  the  mode  of  action  of  polyene 
antibiotics  following  the  group  to  which  they  belong  (see  fig.  2a, b). 


Figure  2.  Model  of  polyene-sterol  complex  (ref.  7) . 

a)  for  group  I  components  b)  for  group  II  components. 

Such  a  model  needs  to  be  borne  out  by  the  study  of  energetics  of  polyene-sterol 
interactions  by  computer  modelling.  It  may  be  easily  conceived  that,  for  such  a 
theoretical  study,  a  search  for  the  preferred  conformation  of  AMB  and  derivatives  is  of 
prime  necessity,  but  until  now  experimental  data  concerning  the  structure  of  these 
groups  of  molecules  arc  very  scarsc,  except  the  crystal  structure  determination  of  N- 
iodoacetyl  Amphotericin  B  (ref.  8). 


Thus  the  goal  of  this  work  (which  represents  only  a  preliminary  step  in  the  study  of  such 
a  problem)  is  to  perform  a  series  of  conformational  investigations  of  the  zwitterionic 
form  of  Amphotericin  11.  This  state  of  ionization  has  been  chosen  because  in  biological 
media,  the  polar  head  of  Amphotericin  B  is  in  contact  with  water.  As  such,  our  study 
will  deal  with  both  isolated  and  hydrated  states  of  Amphotericin  B. 


1  METHOD 

Both  intra  and  intcrmolccular  energies  have  been  calculated  simultaneously  using  the 
SI  15 FA  method  (Sum  of  Interactions  Between  Fragments  computed  Ab-initio). 

1 )  Intermolecular  energy 

The  procedure  is  based  on  the  use  of  additive  components  of  the  intcrmolccular  inter¬ 
action  energy  (ref.  10)  the  expressions  of  which  arc  fitted  in  such  a  way  to  satisfactorily 
reproduce  the  results  of  ab  intio  SCF  supcrmolccuie  calculations  on  small  complexes: 


A:=l  /=/+! 


(1) 


where  N  is  the  total  number  of  molecules  in  interaction,  with 

1‘untcr  ~  l-MTP  +  ^t'Ol.  +  ^RhP  +  I'-DISP  +  l-CT  (2) 

-  EMrr  :  denotes  the  electrostatic  interaction  energy  between  the  multipolar  expansions 
(ref.  11)  of  the  ab-mitio  electron  density  (calculated  within  an  adapted  minimal  basis  set 
(ref.  12)  of  the  interacting  molecules. 

-  F.,vl  :  is  the  corresponding  polarization  component. 

-  ERtr  and  Emsr :  arc  the  repulsion  and  dispersion  contributions  respectively. 

-  Ecr  :  is  the  charge  transfer  contribution. 

2)  Intramolecular  energy 

In  the  S1BFA  method,  a  large  molecule  is  built  out  of  constitutive  molecular  fragments 
separated  by  single  bonds  (ref.  13).  In  fact  one  calculates  the  variation  of  the 
conformational  energy  as  a  sum  of  inter-  fragments  interaction  energies: 


■v  ,v 

A/wa  =  XZ  (3) 

where  N  is  now  the  number  of  fragments. 

is  calculated  as  a  sum  of  the  four  first  contributions  given  in  eq.(2),  plus  a  term 
denoted  which  is  a  transferable  torsional  energy  contribution,  calibrated  for 
elementary  rotations  around  single  bonds  (for  more  details  concerning  this  method  see 
ref.  10,13). 
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As  it  was  emphasized  by  Gresh  ct  a!  (ref.  13),  in  order  to  calculate  the  charge  distri¬ 
bution  of  the  whole  molecular  entity,  the  constitutive  fragments  could  be  defined 
following  two  different  procedures: 

a)  One  may  compute  the  multipolcs  for  the  fragments  which  arc  used  in  the  calcu¬ 
lation  of  inter-fragment  interactions. 

b)  One  may  compute  multipolcs  of  the  largest  fragments  that  can  be  calculated  within 
an  ab-initio  framework  and  then  define  subfragments  for  calculating  conformational 
energy  changes.  The  choice  between  these  two  procedures  is  critical  and  depends  on  the 
molecules  and  fragments  studied.  This  point  will  be  discussed  in  the  Appendix. 

As  an  evaluation  of  the  solvent  effect,  we  have  only  taken  into  consideration 
'I  lydration  water'  molecules,  i.c.  the  ones  which  are  very  close  to  the  solute  and  thus 
interact  very  strongly  with  it.  In  order  to  estimate  the  'Hydration  energy'  (A ),  it 
may  be  supposed  that  each  water-solute  interaction  (Ew.,)  replaces  a  water-water  inter¬ 
action  (/:„_„) ,  so  that: 

AT, 

^1'Ilydra  ”  /  V*w— s  w^'w—w 

Nw  being  the  number  of 'Hydration  water'  molecules.  We  have  used  the  value  of  Ew  . 
calculated  with  the  SI  UFA  method.  We  arc  aware  that  A/:,,,*,  only  represents  part  of 
the  total  solvation  energy  in  water,  but  such  a  study  should  give  an  eventual  insight  into 
possible  intramolecular  conformational  changes  due  to  these  strong  water-solute  inter¬ 
actions. 


II  CONFORMATION  OF  Till:  POLAR  III-AD  OF  AMI! 

A  -  Isolated  state 

In  this  present  study,  all  bond  lengths  and  bond  angles  were  fixed  to  the  values  obtained 
from  X-ray  study  (ref  8).  Because  of  the  presence  of  a  conjugated  double  bond  system, 
we  consider  that  the  heptaemc  macrolactone  ring  remains  rigid  and  therefore 
independent  of  the  surrounding  medium;  this  has  been  confirmed  by  Rerun  and  Iigan 
(ref.  9),  in  the  ease  of  erythromycin,  another  macrolidc  antibiotic,  the  conformation  of 
macrolactone  ring  in  solution  does  not  change  when  compared  with  crystal  state.  1  lencc 
we  have  kept  this  part  of  the  molecule  within  the  conformation  established  in  the  crystal 
and  we  have  been  interested  only  by  the  flexible  polar  head.  Furthermore  we  assume  that 
the  hcptacnic  macrolactonc  ring  of  Amphotericin  I!  does  not  influence  the  conformation 
of  the  polar  head  part  of  the  molecule.  In  effect  preliminary  calculations  performed  when 
taking  (or  not)  into  account  this  'rigid  tail'  have  led  to  strictly  identical  results. 


The  conformational  energy  is  expressed  as  a  function  of  the  five  variable  dihedral  an¬ 
gles  a,  (i=  1,5)  defined  in  fig.  1,  choosing  the  geometrical  arrangement  determined  in  the 
crystal  as  an  initial  guess  (conformation  A): 

a)  We  have  calculated  different  conformational  energy  submaps  E  =  f  (a2,  a, )  by  means 
of  systematic  variations  of  a,  dihedral  angle  values.  We  have  first  kept  a4  and  as  fixed 
respectively  at  0.0°  and  150.0°.  The  deepest  energy  minimum  thus  obtained  (confor¬ 
mation  B2  )  has  been  refined  by  an  automatic  minimization  process  involving  the  five 
variable  dihedral  angles  simultaneously.  This  process  has  been  repeated  for  different  va¬ 
lues  of  «4  and  a.y 

b)  We  have  also  performed  direct  minimization  processes  involving  simultaneously  the 
five  torsional  angles. 

Our  conformational  investigation  using  either  strategy  led  to  nearly  identical  results 
from  both  a  geometrical  and  energetical  point  of  view. 

The  two  minima  (denoted  C,  and  C2)  we  have  obtained  arc  very  close  in  geometry,  and 
mainly  differ  by  the  value  of  a4  dihedral  angle  defining  the  position  of  the  sugar  hydroxyl 
hydrogen  (see  Table  1). 


Confor¬ 

mations 

«■ 

a2 

*4 

«s 

A 

67.7 

272.4 

142.1 

0.0 

150.0 

B, 

67.7 

302.4 

117.1 

0.0 

150.0 

B, 

187.7 

292.4 

92.1 

0.0 

150.0 

c, 

183.8 

292.6 

92.0 

315.4 

175.2 

c2 

202.7 

285.7 

83.5 

220.0 

177.3 

Table  1.  Values  of  dihedral  angles  defining  different  Amphotericin  conformations 
in  isolated  state. 


Conformation  A  determined  in  the  crystal  is  computed  to  be  34.4  Kcal/molc  above 
our  minimum  C,  and  mainly  differs  by  the  values  of  at  and  a,  angles. 

Analysis  of  the  submap  E  =  1(  «2,  a5 )  calculated  with  a,  fixed  at  its  value  in  the  X-ray 
structure  showed  that  only  a  restricted  area  is  allowed  in  the  a2  -  a3  conformational 
subspacc.  It  resulted  in  a  unique  minimum,  with  the  24.4  Kcal/mole  energy  stabilization 
proceeding  from  a  simultaneous  change  in  «2  and  a}  angles  (conformation  B,  in  Tables 
1,2).  Varying  the  value  of  a,  angle  docs  not  change  drastically  the  shape  of  our  submaps. 
We  observed  that  the  location  of  the  unique  minimum  is  slightly  shifted  in  the  a2  - 
subspace  (conformation  B2)  with  an  additional  stabilization  energy  of  6  Kcal/molc. 
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Confor¬ 

mations 

Repul 

A  Tie,,, 

h? 

< 

A£ji/ 

B, 

18.1 

-11.3 

-25.9 

0.1 

-24.4 

b2 

21.8 

-18.7 

-33.8 

-0.7 

-30.3 

c, 

25.1 

-21.3 

-36.6 

-0.8 

-34.4 

Q 

23.7 

-22.3 

-27.6 

-1.3 

-28.0 

Tabic  2.  Variation  of  intramolecular  energy  A/;r„(and  its  components). 

Iincrgy  of  conformation  A  is  taken  as  the  zeroth  energy. 

Subscripts:  Repul,  Disp,  1U,  Pol  for  Repulsion, Dispersion,  electrostatic 
and  Polarisation  respectively.  '  1  values  in  Kcal/molc). 


The  conformation  of  the  polar  head  in  the  zwittcrionic  form  is  mainly  governed  by 
electrostatic  forces  (see  Table  2).  The  stabilization  of  the  closed  conformations  (C,  or 
Q  )  is  due  to  an  array  of  three  intramolecular  II -bonds  (see  fig.3a-b),  one  of  them 
connecting  the  lactone  ring  and  the  sugar  moiety. 


Figure  3.  Amphotericin  conformations  in  the  isolated  state: 
a)  from  RX  data  (A),  b)  Calculated  (C,) 


The  preferential  stabilization  of  the  'folded'  conformation  with  respect  to  A  was 
confirmed  by  SCF-Ab  initio  calculations  with  our  minimal  'adapted'  basis  set  (ref.  12) 
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V  '  performed  on  both  A  and  B2  geometries:  B2  has  been  calculated  more  stable  than  A  by 

t  22  Kcal/mole.  A  more  quantitative  agreement  was  not  anticipated  since  it  is  well  known 

y*  - 

-f;  that  SCF  method  docs  not  take  into  account  correlation  effects;  nevertheless  we  think 

ij  that  the  qualitative  agreement  is  very  satisfactory. 

> 

s  B  In  water 

-  In  a  first  step,  five  water  'hydration'  molecules  were  taken  into  account.  Minimization 
s.  of  the  sum  of  inter  and  intramolecular  energies  was  performed  on  the  crystal  confor- 
f.  mation  (A)  and  the  two  folded  conformations  (  C,  and  C2).  The  presence  of  these  water 
molecules  docs  not  influence  significantly  the  geometrical  arrangement  of  both 
<  C,  and  C2  structures,  the  intramolecular  energy  of  which  remains  almost  unchanged, 
t-  Optimizing  A  conformation  leads  to  a  small  change  in  the  dihedral  angle  value  (125° 

1  instead, of  142°)  leading  to  a  structure  denoted  A'  which  is  intermediate  between  the 

;  'open'  A  and  the  'folded'  C  ones,  similar  to  the  B,  conformation  (table  I).  From  an 
intramolecular  point  of  view,  we  note  a  10.0  Kcal/mole  stabilization  of  A'  with  respect 
f  to  A  conformation  ;  thus,  when  taking  into  account  the  intcrmolccular  energy,  the  total 

\  energy  difference  between  A'  and  C',  hydrated  molecules  favouring  C',  is  reduced  to  5.8 

J  Kcal/mole  (Table  3). 

1 


Compared  Conformations 

Nw 

^'Intre 

C't  /  A' 

5 

-13.6 

7.8 

-5.8 

C",  /  A" 

9 

-10.8 

9.4 

-1.4 

C"2  /  A" 

9 

-10.7 

12. 1 

1.4 

Table  3.  Variations  of  total  energy  AHro,  of  hydrated  structures  (and  its  components). 
Nw  is  the  number  of  water  molecules 

Subscripts:  Inter,  Intra  for  Intramolecular  and  Intcrmolccular. 

AF/„,„  represents  the  energy  difference  between  AF„w,0(defincd  in  Section  I) 
of  related  conformations.  (All  values  in  Kcal/mole). 


•  In  a  second  step,  four  additional  water  molecules  were  added  in  order  to  saturate  the 
j  polar  head.  After  both  intra  and  intcrmolccular  optimizations,  we  noted  an  additional  6 

J  Kcal  intramolecular  energy  gain  for  A"  conformation.  Overall,  when  considering  the  to- 

|  tal  energy  of  the  system  including  the  nine  water  molecules,  comparable  stabilities  (to 

I  within  2  Kcal/mole)  arc  derived  for  A",  C",  and  C"2  hydrated  structures  (  Tables  3  and 

I  4).  (In  Table  3,  AH  represents  the  energy  differences  between  the  C'  and  A'( respectively 

/  C"  and  A")  conformations,  energy  of  A'(rcspcctivciy  A")  is  taken  as  the  zeroth  energy). 
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Confor¬ 

mations 

0'S 

A" 

79.4 

284.8 

125.5 

7.4 

162.4 

C", 

188.8 

288.9 

89.3 

36.9 

151.4 

C"2 

213.4 

283.8 

87.1 

215.2 

180.1 

Tabic  4.  Values  of  dihedral  angles  defining  different  Amphotericin  conformations 
surrounded  by  nine  water  molecules. 


Mg  4a-b  illustrate  the  position  of  the  nine  water  molecules  surrounding  the  polar  head 
of  A"  and  C"2  conformations. 


Figure  4.  Polar  Head  of  AMB  with  'Hydration  water'  molecules, 
a)  Conformation  A"  b)  Conformation  C"2 

AMB  is  represented  by  heavy  line.  Some  molecular  distances  between  AMB 
and  water  arc  represented  by  dashed  lines. 

It  may  be  noticed  that  the  hydrated  structure  still  maintains  the  intramolecular  11- 
bonding  which  occurs  in  water-free  zwittcrion. 

As  a  striking  result,  the  occurrence  of  a  II-bondcd  bridged  water  configuration 
involving  :  a)  COO”,  Nil,'  and  OH  groups  in  C",  and  C"2  ;  b)  COO  and  Nil,*  in  A"  is 
noteworthy. 

At  this  stage,  we  wish  to  emphasize  that  our  results  arc  distinct  from  the  one  obtained 
by  Rinnert  and  Maigret  (ref.  14)  who  found  that  the  conformational  behaviour  of 
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Amphotericin  B  is  independent  of  the  electrical  state  of  the  molecule  itself  and  of  its 
surrounding  medium.  This  has  led  us  to  investigate  whether  such  a  result  may  not  stem 
from  the  electrostatic  energy  contribution,  which  was  computed  in  ref.  14  by  means  of 
Pariser-I’arr-Dcl  Re  charges.  For  that  purpose,  we  recomputed  the  conformational 
energy  differences  between  A  and  C  conformations  now  using  Mulliken  atomic  charges 
rather  than  the  multipolar  expansion.  These  computations  (unpublished)  again  yielded 
a  preferential  stabilization  of  C. 


CONCLUSION 

The  results  of  our  computations  show  that  in  the  five-dimensional  conformational 
space,  the  geometrical  structure  of  Amphotericin  B  is  principally  governed  by  two 
dihedral  angles  a,  and  a3.  In  the  isolated  state,  the  minima  corresponding  to  folded 
confoi  mations,  stabilized  by  intramolecular  11-bonds,  remain  stable  in  presence  of 
hydrating  water  molecules.  In  the  hydrated  state,  anotner  energetical  minimum  was 
found,  characterized  by:  a)  a  distinct  arrangement  of  water  molecules  and  b)  a 
geometrical  structure  intermediate  between  the  folded  and  the  open  ones.  These  three 
structures  were  found  nearly  isoencrgctic.  These  results  thus  exemplify  the  effects  of 
environmental  factors  on  the  conformational  preferences  of  this  molecule.  In  fact,  the 
system  manages  in  order  to  stabilize  its  total  intra  plus  inter  molecular  energy.  Thus  it 
is  not  to  be  wondered  that  the  geometrical  structure  we  have  determined  in  isolated  or 
hydrated  states  are  different  from  the  one  established  in  the  crystal.  Let  us  recall  here 
that  the  crystal  structure  determination  of  Amphotericin  B  is  actually  related  to  a  neutral 
derivative,  in  which  the  ammonium  is  substituted  by  a  bulky  N-iodoacctyl  substituent. 
Furthermore,  this  study  enabled  us  to  characterize  a  particular  binding  arrangement  for 
one  water  molecule,  bridging  'ogether  the  COO  and  Nil)'  groups  (and  eventually  sugar 
01 1  group).  '1  his  disposition  is  reminiscent  of  the  representation  recently  proposed  by 
I  lerve  ct  al.  (ref.  7).  A  complementary  (or  competitive)  1 1-bond  may  occur  between  this 
water  molecule  and  the  /(  01 1  group  of  a  sterol  molecule  in  complexes  of  Amphotericin 
B  with  sterols,  and  work  is  under  progress  along  these  basis. 


APIM-NDIX 

We  will  show  that  the  hoice  of  the  constitutive  molecular  fragments  used  in  the  cal¬ 
culation  of  the  charge  distribution  of  the  whole  molecule  is  not  without  consequence 
upon  the  final  quantitative  results.  We  have  derived: 

a)  iTUiltipolcs  of  the  fragments  used  in  the  calculation  of  inter-  fragment  interaction: 
Ma. 

b)  multipolcs  of  the  polar  head  within  two  conformations:  the  folded  one  B,  and  the 
open  one  A. 
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These  two  different  multipolar  distributions  are  denoted  as  Mb2  and  Mbl  respectively. 
-  We  have  calculated  the  intramolecular  energy  difference  between  both  the  folded  (B2) 
and  open  (A)  conformations  using  these  three  multipolar  distributions. 


Fragmentations 

aeb 

ae,„m 

Ma 

-38.3 

-37.5 

Mbl 

-39.4 

-34.9 

Mb2 

-33.8 

-30.3 

Table  5.  Variation  of  intramolecular  energy  AE,„,„  for  the  different  fragmentations. 
Energy  of  conformation  A  is  taken  as  the  zeroth  energy. 

Subscripts:  FJ  for  Electrostatic.  (All  values  in  Kcal/mole). 


Our  results  listed  in  Table  5  clearly  show  that  Ma.Mbl,  Mb2  lead  to  the  same  quali¬ 
tative  conclusions  namely  the  greatest  stability  of  the  folded  conformation  IS2,  but  the 
electrostatic  component  (and  thus  the  intramolecular  energy)  depends  on  the  choice  of 
the  fragments  used  in  the  calculation  of  the  multipolar  distribution.  We  can  reasonably 
think  that  Mb2  multipolcs  are  the  most  accurate  ones,  since  they  have  been  obtained 
from  a  conformation  not  involving  any  intramolecular  11-bonds;  this  is  not  the  ease  of 
Mbl  distribution  which  should  reflect  a  quite  important  charge  transfer  between 
COO' and  Nllj'(the  ones  involved  into  an  intramolecular  11-bond),  therefore  Mbl 
multipoles  arc  characteristic  of  the  conformation  of  the  fragment  and  in  agreement  with 
(iresh  et  al.  ( ref.  13).  We  think  that  this  would  induce  a  'bias'  in  conformational  calcu¬ 
lations.  On  the  other  hand  we  have  verified  that  the  7.5  Kcal/mole  overestimation  we 
have  obtained  when  using  the  Ma  multipolar  distribution  mainly  results  from  a 
overcstimation  of  the  negative  charge  on  COO  (-0.874  c  instead  of  -0.728  c  calculated 
with  Mbl).  It  appears  that  care  should  be  exercised  when  ionized  fragments  arc  involved 
in  a  molecule  under  study  since  it  is  well  known  that  the  charge  spreads  beyond  the 
ionized  group.  As  a  conclusion  we  may  reasonably  think  that  accurate  multipolar  dis¬ 
tribution  should  be  obtained  if  they  arc  calculated  from  a  /urge  fragment  within  a  fully 
extended  conformation. 


ACKNOWLEDGEMENTS 

This  work  was  supported  in  part  by  Convention  109  RO  between  University  and  CEA. 
The  authors  thank  the  Groupement  Scicntifiquc  IBM-CNRS  'Moderation  Moleculairc' 
for  its  financial  support. 


***03S? i** 


REFERENCES 


1.  J.  Kotler-Brajtburg,  G.  Mcdoff,  G.S.  Kobayaski,  S.  Boggs,  D.  Schlcssi  nger,  R.C. 
Pandcy  and  K.L.  Rinehart,  Classification  of  polyene  antibiotics  according  to 
chemical  sturcturc  and  biological  effect,  Antimicrob.  Agents  Chcmothcr.  15  (1979) 
716-722. 

2.  B.  Malewicz,  Il.M.  Jenkin  and  E.  Borowski,  The  repair  of  membrane  alteration 
induced  in  baby  hamster  kidney  cells  by  polyene  macrolides  antibiotics,  Antimicrob. 
Agents  Chcmothcr.  19  (1981)  238-247. 

3.  B.  Malcwicz  and  0.  Borowski,  Energy  dependence  and  reversibility  of  membrane 
alterationsinduced  by  polyene  macrolidc  antibiotics  in  chlorella  vulgaris,  Nature  281 
(1979)  80-82. 

4.  M.  Saint-Pierre  Chazalct,  C.  Thomas,  M.  Dupeyrat  and  C.M.  Gary-Bobo, 
Amphotericin  B-Sterol  complex  formation  and  competition  with  egg 
phosphatidylcholine:  A  monolayer  study,  Biochim.  Biophys.  Acta  944  (1988) 
477-486. 

5.  N.  Ockman,  Interaction  of  Amphotericin  B  with  monolayers  of  egg  lecithin  and 
cholesterol:  Polaiizcd  absorption  spectra,  Biochim.  Biophys.  Acta  345  (1974) 
263-282. 

6.  E.J.  Dufourc,  I.C.P.  Smith  and  1I.C.  Jarrell,  Amphotericin  B  and  model  membranes. 
The  effect  of  Amphotericin  B  on  cholesterol  containing  systems  as  viewed  by 
*II-NMR  Biochim.  Biophys.  Acta  776  (1984)  317-329. 

7.  M.  Ilcrvc,  J.C.  Debouzy,  E.Borowski,  B.  Cybulska  and  C.M.  Gary-Bobo,  The  role 
of  the  carboxnyl  and  amino  groups  of  polyene  macrolides  in  their  interactions  with 
sterols  and  their  selective  toxicity,  a  5IP-NMR  study.  Biochim.  Biophys.  Acta  980 
(1989)  261-272. 

8.  P.  Ganis,  G.  Aviiabilc,  W.  Mcchlinski,  C.P.  Schaffncr,  Polyene  macrolidc  antibiotic 
Amphotericin  B.  Crystal  structure  of  the  N'-Iodoacctyl  derivative.  J.  Amcr.  Chem. 
Soc.93  (1971)4560-4564. 

9.  T.J.  Perun  and  R.S.  Egan,  The  conformation  of  crvthromyein  aglyconcs, 
Tetrahedron  Lett.  (1969)  387-390. 

10.  N.  Gresh,  P.  Clavcric  and  A.  Pullman,  Intcrmolccular  interactions:  Reproduction 
of  the  results  of  Ab  initio  supcrmolcculc  computations  by  an  additive  procedure, 
Int.  J.  Quantum  Chem.  Symp.  13  (1979)  243-253. 

11.  E.  Vigne-Maeder  and  P,  Clavcric,  The  exact  multiccntcr  multipolar  part  of  a 
molecular  charge  distribution  and  its  simplified  representations,  J.  Chem.  Phvs.  88 
(1988)  4934-4948. 

12.  II.  Berthod  and  A.  Pullman,  Molecular  potential,  cation  binding,  and  hydration 
properties  of  the  carboxylatc  anion.  Ab  initio  studies  with  an  extended  polarized 
basis  set,  J.  Coinput,  Chem.  2  (1981)  87-95. 

13.  N.  Gresh,  P.  Clavcric  and  A.  Pullman,  Theoretical  studies  of  molecular  confor¬ 
mation.  Derivation  of  an  additive  procedure  for  the  computation  of  intramolecular 
interaction  energies.  Comparison  with  Ab  initio  SCE  computations,  Theorct.  Chun. 
Acta  66  (1984)  1-20. 

14.  11.  Rinnert  and  B.  Maigret,  Conformational  analysis  of  Amphotericin  B  1,  Isolated 
molecule  .  Biochim.  Biophys.  Acta  101  (1981)  S53-S60. 


#yP*rWr??&f 


f- 


'S-A^Sy*' 


■*'.«VSfh 


*  \ 


Modelling  of  Molecular  Structures  and  Properties.  Proceedings  of  an  International  Meeting,  265 

Nancy,  France,  11-15  September  1989,  J.-L.  Rivail  (Ed.) 

Studies  in  Physical  and  Theoretical  Chemistry,  Volume  71,  pages  265-272 
©  1990  Elsevier  Science  Publishers  B.V.,  Amsterdam  —  Printed  in  The  Netherlands 


|  TRANSPORT  IN  BIOLOGICAL  MEMBRANES  .  MODELISATION  AND  EXPERIMENTS. 

t 

|  E.  GARCINl,F.  BAROS1,  J.C.  ANDRE1,  D.  DAVELOOSE2,  J.  VIRET2, M.L.VIRIOT1 
\  and  M.  DONNER3 

1 

’  1  GRAPP-DCPR-UA  328  of  CNRS,  ENSIC,  1  rue  Grandville,  BP  451,F-54001  NANCY  Cedex 

2  CRSS A,  Laboratoire  de  Biophysique,  BP  87,  F-38702  LA  TRONCHE  Cedex 

3  U  284  of  INSERM,  Plateau  de  Brabois,  CO  10,  F-5451 1  VANDOEUVRE  LES  NANCY  Cedex 

SUMMARY 

Studies  of  biological  membranes  and,  more  in  particular,  of  membrane 
dynamics,  needs  different  physical  techniques  of  investigation,  allowing  to  define 
an  order  parameter  or  some  lateral  diffusion  coefficients.  For  simple  model 
membranes  (mixtures  of  pure  phospholipids  and  purified  proteins),  we  have  shown 
that  proteins  act  as  obstacles  for  the  transport  properties  of  lipids.  Exp<\-iments  are 
issued  from  three  complementary  techniques:  time  resolved  fluorescence 
spectroscopy  (quenching  of  fluorescence  of  pvren)  ,  electronic  spin  resonance 
(ESR)  and  differential  scanning  calorimetry.  Modeling  takes  into  account  some 
concepts  of  liquid  state  physics  in  pseudo-two-dimensional  systems  as  well  as  the 
classical  Smoluchowski  equation  of  transport. 

INTRODUCTION 

Biological  membranes  play  an  important  role  in  the  running  of  life.  Their 
functions  allow  them  to  catalyse  chemical  reactions  as  well  as  to  take  part  to 
transport  phenomena  .  They  are  mainly  make  up  with  phospholipids,  proteins  and 
glucids  .The  interaction  between  proteins  and  lipids  is  a  subject  of  a  great  interest 
in  view  of  a  better  understanding  of  such  mechanisms. As  an  example  ,  it  is  known 
that  the  activity  of  proteins  is  maximum  when  they  are  in  presence  of  a  given 
heterogen  composition  of  lipids.  Moreover,  the  proteins-lipids  interactions  are 
supposed  to  influence  the  lateral  diffusion  in  membranes.  But  the  characterization 
of  the  action  of  each  component  is  quite  hard,  because  of  the  complexity  of  the 
composition  of  membranes.  For  this  reason, our  approach  has  been  to  study  first 
the  properties  of  simple  membranes  model,  made  up  with  an  unique  phospholipid 
1  and  then  to  introduce  in  them  purified  proteins. 

i  In  order  to  define  some  microscopic  properties  of  membranes,  we  used  two 

,  classical  parameters  which  are  the  order  parameter  S  (Zwetkoff  parameter)  and  the 
lateral  diffusion  coefficient  D.  They  are  directly  issued  from  experiments 
i  (fluorescence  quenching  for  D,  ESR  for  S)  and  their  variations  with  the 
experimental  conditions  are  supposed  to  have  a  real  physical  significance.  Of 
course  ,  the  assumption  has  been  made  that  the  probes  don’t  bring  any  perturbation 
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in  the  membranes  and  that  they  can  be  modelised  in  the  same  way  as  the 
phospholipids. 

The  characteristic  times  explored  by  each  method  determine  the  kind  of 
physical  phenomenom  seen  by  them.  For  fluorescence  spectroscopy  using  pyren  as 
probe,  this  time  is  about  Kb7  s  and  for  ESR  about  l(b8 -10*9  s  .  This  last  method  will 
give  information  about  the  rotation  of  small  molecules,  at  the  immediate  vicinity  of 
the  probe.  Pyren  will  explore  larger  domains  and  give  informations  on  the  lateral 
diffusion  of  phospholipids. 


1-LATERAL  DIFFUSION  COEFFICIENT  MEASUREMENT 
Case  of  pure  phospholipidic  membrane 

Fluorescence  techniques  can  be  mainly  classified  in  two  catagories  : 
photobleaching  and  fluorescence  quenching.  All  of  them  give  information  on  the 
lateral  diffusion  coefficient,  the  last  one  having  the  advantage  not  to  modify  the 
membrane  geometry  during  the  measurement.The  kinetic  scheme  is  as  follows: 


with  x  the  natural  lifetime  of  A  and  ka(t)  the  apparent  rate  constant  of  the  reaction. 
In  this  work,  we  used  pyren  as  B  and  electronically  excited  pyren  as  A. 

A  model  for  diffusion  limited  reaction  in  two  dimensions  has  already  been 
developped  (ref.  1)  and  we  recall  here  the  main  features.  The  basic  idea  is  the 
adaptation  of  the  classical  three-dimensions  Smoluchowski's  model  by  considering 
an  homogeneous  system  of  cylindrical  molecules  B  around  a  given  cylindrical 
molecule  A.  Solvent  is  supposed  to  be  a  continuum  in  two  dimensions  and  ,  if  one 
assumed  that  each  collision  between  an  A  and  a  B  molecule  leads  to  reaction  ,  it  is 
possible  to  calculate  the  apparent  rate  constant  of  the  reaction  ,  ka(t),  by  solving  the 
classical  equation : 


d<J> 

dF 


(2  D  D  2  ) 


(I) 


in  which  :  <j>  is  the  ratio  of  the  configurational  distribution  function  of  B  to  the 

mean  concentration  of  B, 

D  is  the  mutual  diffusion  coefficient  of  A  and  B, 
kB  is  the  Boltzman  constant, 

T  is  the  absolute  temperature, 

X  is  the  apparent  potential,  calculated  from  the  radial  distribution 
function  intwo  dimensions,  g(r),  according  to  :  X(r)  =  -  kBT  ln(g(r)) 


-  i^e^KSSsi^sr^ 
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Of  course,  some  refinements  can  be  made  concerning  : 

-  the  thickness  of  the  bilayer  membrane, 

-  the  curvature  of  the  membrane, 

-  a  random  distribution  of  reactants  in  the  membrane, according  to  a 
statistical  law  of  Poisson  type. 

All  of  them  lead  to  an  apparent  rate  constant  such  as : 


ka(t)  -  2ftNDfa 


2c 

v/  71  Dt 


(2) 


with  f  the  corrective  factor  issued  of  the  preceeding  refinements,  o  the  reaction 
distance  of  A  and  B  and  N  the  Avogadro  number. 

The  knowledge  of  the  apparent  rate  constant  allows  to  assume  an  analytical 
form  for  the  fluorescence  decay  of  A  in  the  simple  following  way  : 


d[A] 

[A] 


-  +  ka(t)  [B] 

t 


(3) 


leading  to  : 

(A(t)]  =[A(t=0)]  exp(  -at  -pVt )  (4) 

in  which  a  and  P  are  linear  functions  of  the  concentration  [B],  depending  on  the 
diffusion  coefficient  and  the  reaction  distance. 

Experiments  are  made  using  the  classical  single  photon  counting  technique, 
described  elsewhere(ref.  2)  .Pyren  is  incorporated  in  vesicles  and  solutions  are 
degased  by  bubbling  argon. Decays  are  deconvoluted  by  using  a  fast  Fourier 
transform  algorithm  and  then  fitted  by  the  equation  (4).  Slopes  of  a  and  p  versus 
[B]  give  D  and  a.  Figure  1  shows  such  experimental  results  for  a  in  different  kind 
of  membrane  constitutive  phospholipids,  differing  each  other  by  the  lengh  of  the 
carbon  chain.All  experiments  are  made  above  the  transition  temperature,  at  which 
membranes  are  assumed  to  have  properties  of  a  liquid.  Under  this  temperature, 
coefficients  of  diffusion  are  too  small  and  no  variation  of  a  versus  [B]  is  observed, 
as  illustrated  on  figure  l  in  the  case  of  DPPC  at  25°C.  This  is  of  course  a  limitation 
of  this  method.  Table  l  summarises  the  values  of  D  obtained  at  different 
temperatures,  for  different  kinds  of  phospholipids.Two  interesting  features  appear 
clearly  : 

-  for  a  given  phospholipd  (DLPC),  the  diffusion  coefficient  increases  with  the 
temperature,  indicating  that  membrane  acts  exactly  as  a  liquid  above  the  transition 
temperature.  This  fact  was  already  observed  by  other  authors  using  different 
experimental  technics  (ref.3)  and  this  is  in  favour  of  the  predictive  aspect  of  our 
model. 

-  for  a  given  increase  of  temperature  above  the  transition  temperature(about 
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10°C),  the  diffusion  coefficient  increases  with  the  lengh  of  the  carbon  chain, 
indicating  a  greater  fluidity  in  the  region  where  pyren  moves.  Such  a  fact  seems  to 
be  in  favour  of  a  localization  of  the  probe  quite  far  from  the  polar  heads  of 
phospholipids. 


Fig.l.  Evolution  of  a  (see  text)  versus  the  ratio  R  =(pyren)/(phospholipids)  for 
different  kinds  of  phospholipids  (see  nomenclature)  at  different  temperatures. 


Phospholipid 

number  of  carbons 

T(°C) 

D  (10‘7  cm2/s) 

DLPC  (0°C) 

12 

15 

1.7 

25 

2.9 

35 

4.4 

50 

5.9 

DMPC  (24°C) 

14 

35 

3.1 

DPPC  (4i°C) 

16 

50 

6.3 

DSPC  (59°C) 

18 

67 

6.4 

Table  1.  experimental  values  of  D  (transition  temperatures  are  given  into  brackets). 


Case  of  mixture  phospholioids-proteins 

We  use  now  DPPC  as  phospholipid  and  GGT  as  incorporated  protein.  The 
surface  of  the  hydrophobic  part  of  this  protein  has  been  estimated  at  about  375  A2, 
by  comparison  with  other  known  proteins(ref.4).  It  is  assumed  to  be  a  cylinder, 
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nearly  immobile  in  front  of  the  fast  displacements  of  pyren.  All  proteins  are 
assumed  to  be  incorporated  in  membranes  and  the  diffusion  coefficients  are 
measured  according  to  the  method  described  above,  for  different  concentrations  of 
proteins  (figure  2).  The  effect  is  clearly  a  reduction  of  the  diffusion  coefficient 
when  the  surface  occupied  by  proteins  increases,  indicating  that  this  protein  has  an 
hardening  effect  on  the  membrane,  similar  to  other  peptides  or  enzyme  (ref.5). 

This  effect  has  been  modelised  always  on  the  base  of  the  Smoluchowski 
equation,  by  assuming  a  spatial  variation  of  the  diffusion  coefficient , caused  by  the 
presence  of  the  immobile  protein.  This  model  has  been  described  in  (ref.6).  The 
main  result  of  it  is  the  prediction  of  a  decrease  of  the  mean  diffusion  coefficient 
linearly  with  the  surface  occupied  by  the  protein.  This  straight  line  is  also  shown 
on  figure  2,  and  it  appears  clearly  a  satisfying  agreement  with  experiments 
according  to  the  simplification  made,  even  if  the  model  seems  to  overestimate  D. 


Fig.2.Variations  of  the  diffusion  coefficient  of  pyren  with  the  surface  occupied  by 
proteins.  Experiments  and  model. 


2-ELECTRON  SPIN  RESONANCE  AND  DIFFERENTIAL  SCANNING  CALORIMETRY 
EXPERIMENTS 

We  always  study  the  system  DPPC-GGT  in  which  paramagnetic  probes  5NS  or 
16NS  are  incorporated.  These  probes  are  supposed  to  explore  respectively  the 
region  near  the  polar  head  and  near  the  center  of  the  bilayer.  Figures  3  and  4 
shows  the  typical  ESR  spectrum  for  each  probe,  as  well  as  the  parameters  used  to 
characterize  the  order  around  the  probe.  They  also  show  the  variations  of  these 
parameters  with  the  temperature,  for  different  concentrations  in  proteins.  The  first 
general  remark  is  that  16NS  gives  informations  under  and  above  the  transition 
temperature,  with  a  clear  evidence  of  the  transition.  This  is  not  the  case  for  5NS, 
for  which  the  transition  is  very  soft,  nearly  invisible  at  low  concentration  in 
proteins.  A  second  remark  concern  the  disappearance  of  the  pretransition  (T=32°C) 
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when  proteins  are  incorporated.Then  it  appears  that  the  two  probes  give  apparently 
conflicting  results.Indeed,  figure  4  shows  a  fluidizing  effect  under  the  transition 
temperature  and  a  hardening  effect  above,  in  agreement  with  results  obtained  by 
fluorescence  quenching.On  the  other  hand,  5NS  exhibits  a  hardening  effect  under 
the  transition  temperature. 


0  10  20  30  40  50  60 

Fig.3.  16NS  in  membrane:  temperature  evolution  of  the  parameter  h  for  three 
concentrations  of  proteins. 


Fig.4.  5NS  in  membrane:  temperature  evolution  of  the  parameter  A//  for  three 
concentrations  in  proteins. 
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Moreover  ,  when  the  concentration  of  GGT  increases,  the  spectrum  of  16NS 
exhibits  an  additional  peak,  witness  of  the  hindrance  for  some  probes  to  rotate 
freely. 


Complementary  experiments  have  been  achieved  by  using  now  a  macroscopic 
method,  that  is  differential  scanning  calorimetry  (DSC).  It  becomes  to  measure  the 
variations  of  energy  necessary  to  heat  in  the  same  way  two  samples  :  buffer  as 
reference  and  mixture  (DPPC  +  GGT).  If  a  transition  is  visible,  it  appears  a  peak 
from  which  one  can  calculate  the  enthalpy  of  transition  and  a  number  of 
cooperativity  characterizing  this  transition.  Figure  5  shows  the  decrease  of  these 
two  quantities  when  the  concentration  in  proteins  increases. 
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Fig.5. Variations  with  the  concentration  in  proteins  of  the  relative  enthalpy  and  of 
the  number  of  cooperativity  (N)  for  multilamellar  vesicles  of  DPPC  +  GGT. 

I  3-PISCUSSION  AND  CONCLUSION 

Measurement  of  lateral  diffusion  coefficient  of  pyren  allow  to  characterize  the 
lateral  displacement  of  lipids  without  assuming  anything  on  the  membrane  fluidity 
when  proteins  are  incorporated.  Indeed,  these  one  can  decrease  the  collision 
frequency  of  two  lipids  by  two  different  way  :they  induce  a  hardening  effect  in  the 
membrane  or  they  act  as  immobile  obstacles  and  increase  the  path  of  two  lipids 
before  they  collide.  It  has  been  noted  (ref.7)  that,  for  a  given  proteic  surface,  the 
"hardening"  effect  is  more  important  for  small  proteins,  which  seems  in  favour  of 
the  second  assumption. 

ESR  results  indicate  that  a  certain  quantity  of  probes  is  "locked"  by  proteins, 
during  a  time  which  can  be  estimated  to  about  10'7  s.  This  phenomenom  disapears 
when  temperature  increases.  This  is  in  favour  of  a  model  of  lipids  bound  to 
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proteins.  DSC  results  give  an  extra  information  on  the  number  of  bound  lipids  for 
one  proteins:  if  one  extrapolate  for  AH  =0  the  results  of  figure  5,one  obtains  about 
8  lipids  in  ring  around  one  protein. 

The  apparent  contradiction  between  results  under  the  transition  temperature 
obtained  with  16NS  and  5NS  can  perhaps  been  explained  in  the  following  way  :  if 
one  imagine  that  GGT  does  not  penetrate  very  deep  in  the  membrane,  it  is  easy  to 
see  that  rotation  of  5NS  (near  the  polar  head)  will  be  uneasy  (hardening  effect) 
when  the  rotation  of  16NS  (near  the  center  of  the  bilayer)  will  be  nearly  free. 

The  decrease  of  enthalpy  observed  in  DSC  experiments  can  be  explained  in 
different  ways : 

-  lipids  are  bound  to  proteins, 

-  lipids  are  trapped  by  aggregates  of  proteins, 

-  proteins  have  a  global  thermodynamic  effect  on  yhe  membrane. 

In  the  first  two  cases, bound  lipids  don't  participate  to  the  transition  and  their 
number  increases  with  the  concentration  in  proteins. But  this  phenomenom  is 
independent  of  the  direction  of  the  transition.  It  means  that  the  same  results  is 
observed  when  temperature  is  decreased  from  the  liquid  to  the  rigid  phase.  This  is 
again  in  favour  of  the  first  assumption,  since  aggregates  are  assumed  not  to  exist  in 
the  liquid  phase,  but  some  authors  (ref.8)  have  arguments  for  the  third 
assumption.  If  DSC  experiments  don’t  allow  to  give  a  clear  response,  the 
indications  of  ESR  experiments  remain  in  favour  of  the  assumption  of  lipids  bound 
to  proteins. 

NOMENCLATURE:  DLPC:  Dilauroylphosphatidylcholine ; 

DMPCDimyristoylphosphatidylcholine; 

DPPC  :  Dipalmitoylphosphatidylcholine;  DSPC :  Distearoylphosphatidylcholine 

GGT :  gamma-glutamyltranspeptidase;  5NS  :  5  nitroxyde  stearic  acid;  16NS  :  16  nitroxyde  stearic 

acid 
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\  SUMMARY 

*  The  hydrogen  bonding  properties  of  two  classes  of 

■  photosystem  II  inhibitors  (amide/urea  and  s-triazine 
f  derivatives)  were  studied  by  the  semi-empirical  quantum  method 
f  AMI,  using  water,  methanol  and  p-chlorophenol  as  proton 
\  donors.  These  calculations  showed  a  lower  hydrogen  bonding 
j  capacity  of  the  tria2ine  ring  nitrogens  compared  to  the 
s  carbonyl  of  amide/urea  inhibitors.  However,  some  discrepancies 
;  arose  from  the  comparison  with  IR  spectrometry  data. 

f 

1  INTRODUCTION 

!  The  herbicides  belonging  to  urea/amide,  uracil,  s- 

;  triazine  and  cyanoacrylate  families  inhibit  the  photosynthetic 
electron  flow  by  competing  with  a  plastoquinone  molecule  for 

■  binding  at  a  common  site,  on  the  D1  protein  of  photosystem  II 

;  (PS  II).  The  binding  of  the  plastoquinone  in  the  D1  site  forms 
:  the  secondary  acceptor  QB,  which  reoxidizes  the 

i  photochemically  reduced  primary  acceptor  QA“  (ref.  1). 
i  The  role  of  hydrogen  bonding  in  the  mechanism  of  this 

|  inhibition,  first  postulated  from  QSAR  studies  (ref.  2),  was 
>  demonstrated  by  X-ray  crystallography  showing  the  interactions 
J  of  the  triazine  terbutryne  with  the  bacterial  reaction  center 

(ref.  3,4).  Even  though  hydrogen  bonding  is  likely  to 
contribute  only  a  part  of  the  whole  interaction  energy,  it 
I  seems  to  be  essential  for  the  binding  of  these  inhibitors. 

|  However,  the  PS  II  centers  are  somewhat  different  from 

f  bacterial  centers  (refs.  3,5)  and  no  PS  II  crystal  structure 
I  has  been  obtained  so  far.  Since  amide/urea  derivatives 
I  (diuron-like  inhibitors),  unlike  triazines,  are  inactive  on 
I  bacterial  photosynthesis,  the  mechanism  of  their  binding  on 

f  the  PS  II  site  remains  unknown.  A  homology  between  the  -NH-CO- 

& 

i  v 

* 

4  ' 


274 


r2 

Phenyl-amides /ureas 

rCtnh~ 

CO- Y 

Ri 

R2 

Y 

SWEP 

Cl 

Cl 

O-CH3 

PROPANIL 

Cl 

Cl 

ch2-ch3 

DIURON 

Cl 

Cl 

n-(ch3)2 

MONURON 

Cl 

H 

II 

FENURON 

H 

H 

II 

ISOPROTURON 

CH- (CH3 ) 2 

H 

It 

CYCLURON 


a  NH-CO-N 


^CHs 

^CHs 


Triazines 


ATRAZINE  R  =  Cl 

AMETRYNE  R  =  S-CH3 


Fig.  1.  Structure  of  the  PS  II  inhibitors 
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common  motif  of  diuron-like  inhibitors  and  the  -NH-CN=  pattern 
present  in  triazines  was  proposed  long  time  ago  (refs.  2,6)  and 
the  evidence  of  a  double  hydrogen  bond  formed  by  the  latter  in 
bacterial  reaction  centers  (ref.  3)  suggests  a  similar 
mechanism  of  interaction  for  the  former.  However,  this  implies 
that  the  energetically  unfavoured  cis  form  of  the  amide  would 
interact  with  the  site.  Furthermore,  the  inhibitory  power  of 
diuron-like  compounds  remains  almost  the  same  in  triazine- 
resistant  mutants  of  higher  plants,  in  which  the  substitution 
of  serine  264  by  a  glycine  in  the  binding  pocket  induces  a  500 
fold  decrease  in  the  activity  of  atrazine  (ref.  7). 

Since  a  triazine  ring  nitrogen  or  the  carbonyl  of 
amide/urea  derivatives  are  supposed  to  form  the  interaction 
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with  the  site,  we  have  here  compared  the  hydrogen  bonding 
capacities  of  several  structures  from  these  groups  of 
inhibitors  with  model  proton  donors  such  as  water,  methanol  and 
p-chlorophenol .  Calculations  were  performed  in  vacuo  using  the 
semi-empirical  quantum  chemistry  method  AMI  and  the  results 
were  compared  with  hydrogen  bonding  measurements  by  IR 
spectroscopy  in  CCl^.  The  size  of  the  systems  studied,  ranging 
from  25  to  50  atoms,  did  not  allow  reasonable  ab  initio 
calculations  to  be  performed.  Since  AMI  seems  to  give  more 
reliable  results  than  other  semi-empirical  methods  for  the 
molecular  properties  of  the  compounds  here  considered  (ref.  8) 
and  for  hydrogen  bonding  calculations  in  general  (ref.  9),  it 
was  chosen  for  this  work. 

METHODS 

Calculations 

Structures  were  constructed  and  minimized  using  a  SYBYL 
package  including  the  molecular  mechanics  TRIPOS  5.1  force 
field.  The  resulting  sets  of  coordinates  were  fully  optimized 
by  the  Quantum  Chemistry  Method  AMI  1.0  (ref.  10).  Atomic 
charge  densities  were  calculated  for  the  optimized  structure. 

Infra-Red  spectroscopy 

Hydrogen  bonding  between  some  of  the  inhibitors  studied 
and  p-chlorophenol  was  measured  by  IR  spectrometry,  from  the 
absorbance  of  the  longitudinal  vibration  of  the  unbounded 
phenolic  hydroxyl.  The  solvent  was  carbon  tetrachloride,  in 
cuvettes  of  1  cm  optical  path.  Concentrations  of  phenol  and 
inhibitors  were  sufficiently  low  (1  to  4  mM)  to  avoid  self¬ 
associations  . 

RESULTS 

Pyridine  -  H2O 

In  their  evaluation  of  AMI  for  hydrogen  bonding  studies, 
Buemi  et  al  (ref.  10)  conclude  that  this  method  give  better 
results  than  other  SEQCM  methods,  although  energies  are  still 
underevaluated  and  H  bond  distances  overevaluated,  compared  to 
experimental  data.  We  did  reproduce  these  results  in  the  case 
of  the  pyridine  +  H20  complex  by  a  single  full  optimization. 


I 


I 
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Two  other  procedures  were  tested  in  order  to  determine 
both  H-bond  energy  and  distance  (Fig.  2): 

-  In  the  first  procedure,  the  geometry  was  varied  by 
decreasing  the  N...H...0  distance  without  optimizing  either  the 
internal  coordinates  or  the  water  orientation  relative  to  the 
pyridine. 

-  In  the  second  procedure,  the  whole  complex  was  fully 
optimized  at  each  step>  allowing  rearrangement  of  the  water 
molecule  orientation. 

The  second  procedure  gave  a  lower  heat  of  formation  at  the 
minimum  and  a  lower  hydrogen  bond  distance.  The  water  molecule 
orientated  such  that  both  protons  interacted  with  the  pyridinic 
nitrogen. 


Fig.  2.  Variation  of  the  heat  of  formation  of  water-pyridine  i 
complex  during  formation  of  hydrogen  bond  modelled  by  different 
procedures. 


Inhibitors  -  ^0 

These  procedures  were  applied  to  triazine  and  amide/urea 


structures,  giving  rise  to  particular  features  due  to  the 
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orientation  of  the  water  ^molecule  which  may  lead  to  the 
formation  of  more  than  one  hydrogen  bond. 

In  the  purpose^  of  evaluating  the  capacity  of  the 
inhibitors  to  form  single  hydrogen  bonds  with  the  proteic  site, 
vie  fixed  the  orientation  of  the  water  molecule  such  that  only 


TABLE  la. 

Model  of  hydrogen  bond  formation  with  water. 


MOLECULE 

Hf° 

6“x 

d  X. .H. .0 

SWEP 

3.986 

-0.3806 

3.10 

PROPANIL 

4.022 

-0.3406 

3.15 

DIURON 

4.207 

-0.3814 

3.09 

MONURON 

4.216 

-0.3825 

3.09 

FENURON 

4.251 

-0.3845 

3.09 

ISOPROTURON 

4.401 

-0.3841. 

3.08 

CYCLURON 

4.419 

-0.4020 

3.07 

AMETRYNE 

2.373 

-0.2718 

3.48 

ATRAZINE-2 

1 .513 

-0.2864 

3.63 

ATRAZINE-3 

1  .969 

-0.2822 

3.55 

TABLE  1b. 


Model  of  hydrogen  bond  formation  with  para-chloro  phenol. 


MOLECULE 

Hf° 

6"x 

d  X. .H. .0 

SWEP 

4.547 

-0.3806 

3.08 

PROPANIL 

4.850 

-0.3406 

3.08 

DIURON 

4.978 

-0.3814 

3.04 

MONURON 

5.157 

-0.3825 

3.03 

FENURON 

5.280 

-0.3845 

3.05 

ISOPROTURON 

5.101 

-0.3841 

3.04 

CYCLURON 

5.563 

-0.4020 

3.04 

AMETRYNE 

2.517 

-0.2718 

3.40 

ATRAZINE-2 

2.602 

-0.2864 

3.48 

ATRAZINE-3 

2.531 

-0.2822 

3.69 

Hf°  (kcal/mol)  is 

the  difference 

between  the  sum  of  heat 

of 

formation  of  both 

molecules 

and 

the  one 

of  the  optimized 

complexe  ,  6"x  is  the  partial 

charge  on  the 

atom  acceptor 

in 

the  studied  molecule 

.  d  X. .H. .0 

(angstrom)  is 

the  caracteristic 

distance  in  the  optimized  complexe.  Geometry  of  Hbond  is  fixed 
as  well  as  X.H.O  angle  (180°). 
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one  bond  could  be  formed.  Internal  coordinates  of  both 
molecules  and  distances  between  them  were  then  optimized.  The 
results  in  Table  1  show  a  clear  distinction  between  N'"*H-0 
(atrazine-like)  and  0***H-0  (diuron-like)  and  a  weaker  bonding 
capacity  of  amides  {propanil,  swep)  compared  to  ureas,  in  which 
the  dimethylamino  substituent  enhances  the  nucleophilicity  of 
the  carbonyl.  However,  the  substituents  on  the  N-phenyl  had  no 
influence  on  the  energy  of  bonds  formed  by  the  carbonyl. 

Similar  results  were  obtained  with  methanol  instead  of 
water,  but  the  hydrogen  bonding  energies  were  too  weak  to  allow 
a  distinction  between  urea/amide  derivatives  to  be  made. 


Inhibitors  -  p-chlorophenol 

With  parachlorophenol  instead  of  water,  as  the  hydrogen 
donor,  there  was  still  a  clearly  lower  energy  for  atrazine-like 
than  for  diuron-like  inhibitors.  Variations  of  bonding  energies 
between  amide/urea  derivatives  were  in  a  wider  range  than  those 
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Fig.  3.  Relation  between  the  association  constant  K  of  p- 
chlorophenol/inhibitors  complexes  and  the  Hbond  energy 
calculated  by  AMI. 


Observed  with  water  and  the  influence  of  substituents  on  the  N- 
phenyl  became  significant. 

IR  spectroscopy 

In  the  presence  of  inhibitors,  the  intensity  of  the  free 
OH  vibration  of  p-chlorophenol  (3610  cm-1)  decreased  and  an 
hydrogen-bonded  band  appeared  at  lower  frequencies  only  for 
amides /.ureas .  For  triazines,  a  featureless  absorption  was 
detected,  which  can  be  ascribed  both  to  the  presence  of  three 
unequivalent  ring  nitrogens  and  to  greater  delocalization  of 
the  proton  (ref.  11). 

The  IR  association  constants  (K)  of  atrazine-like  and  diuron- 
like  compounds  with  p-chlorophenol  fall  in  the  same  range.  For 
these  latter  compounds,  the  strength  of  hydrogen  bonding, 
consistently  measured  by  both  K  values  and  the  frequency 
shifts  of  the  OH  bonded  bands  (data  not  show*'-)  indicate  a  much 
greater  influence  of  the  substituents  on  the  N-phenyl  than  that 
predicted  by  computation  (Fig.  3). 

DISCUSSION 

Hydrogen  bonding  calculations  presented  here  show  a 
clearly  lower  proton  accepting  capacities  of  the  nitrogens  of 
triazine  ring  as  compared  to  the  oxygen  of  amide/urea  and  other 
carbonyl-containing  inhibitors. 

Some  discrepancies  arise  in  the  comparison  of  these 
results  with  experimental  hydrogen  bonding  measurements: 

-  the  clear  distinction  between  atrazine-like  and  diuron-like 
inhibitors  resulting  from  AMI  calculations  is  not  supported  by 
the  IR-measured  association  constants.  However,  the  shape  of 
the  hydrogen  bonded  OH  bands  are  quite  different  for  these  two 
classes  of  inhibitors,  ascribable  to  different  properties  ofthe 
N' ■ ‘H-0  and  0- * 'H-0  bonds  (ref.  11). 

-  The  influence  of  the  substituents  of  the  N-phenyl  on  the 
strengths  of  hydrogen  bonds  formed  by  diuron-like  inhibitors 
was  greater  when  measured  by  IR  spectrometry  than  when 
calculated.  QSAR  studies  have  already  shown  that  these 
substituents  also  influence  the  inhibitory  power  through  an 
electronic  inductive  effect  (2). 
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Differences  in  the  proton  accepting  capacities  of 
atrazine-like  and  diuron-like  PS  II  inhibitors  could  contribute 
to  explain  their  contrasting  behaviour  towards  triazine- 
resistance,  in  the  hypothesis  of  a  common  mechanism  of 
interaction  with  the  D1  site  of  inhibition. 
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SUMMARY 


I 


A  method  of  3D  pharmacophor  recognition  has  been  developped  in 
order  to  compare  2  molecules  of  identical  biological  activity.  The  PPSP3 
program  takes  into  account  the  receptor  topology  and  proposes  hypothesis  of 
pharmacophors  based  on  pure  geometrical  criteria.  A  first  attempt  has  been 
made  a  few  years  ago  by  C.  Leroy  (ref.l).  PPSP3  has  been  improved  in  order 
to  consider  more  complex  pharmacophors  and  fast  energetic  calculations 
have  been  incorporated.  Application  of  the  method  has  been  applied  to  the 
classical  morphine-enkephaline  comparison.  One  of  the  PPSP3  solutions 
indicates  a  pharmacophor  of  the  tyrosine  differing  from  the  classical  model. 

INTRODUCTION 


Recent  works  on  opiate  compounds  have  shown  3  types  of  different 
receptors  :  J1  (for  the  morphine),  5  (enkephaline),  K  (benzomorphan)(ref.2). 
Since  the  discovery  of  enkephalines  (ref.3),  numerous  works  have  been 
proposing  models  of  conformation  by  comparing  enkephalines  and  morphinic 
compounds.  As  very  small  changes  in  conformation  give  rise  to  a  high 
variation  of  selectivity  towards  the  receptor,  most  of  the  people  accept  the 
fact  that  enkephalines,  which  are  typical  5  ligands,  could  interact  with  the 
(I  site  and  induce  biological  effects  specific  of  the  Ji.  receptor  (ref. 4).  Starting 
from  the  hypothesis  of  a  common  interaction  at  the  U  site,  PPSP3  program 
has  been  applied  to  the  3D  pharmacophor  search  between  morphine  (rigid 
active  analog)  and  Leu-enkephaline  (flexible  analog).  Results  show  that  two 
molecules  may  have  a  common  pharmacophor  without  geometrical  backbone 
superposition.  Conformational  study  has  been  made  on  pharmacophors 
hypothesis  using  Scheraga  program  ECEPP  (Empirical  Conformational  Energy 
Program  for  Peptides)(ref.5).  The  most  favoured  solution  of  the  program  has 
been  compared  and  superimposed  with  cristallographic  structures. 
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METHODS 

Input  PPSP3  Datas. 

Coordinates  of  the  atoms  come  from  data  banks  such  as  Cambridge  for 
small  molecules  or  .Brookhaven  for  peptides.  Data  are  issued  from 
geometrical  optimization  by  molecular  mechanics  (MM2  program)  or  MAD 
(Molecular  Advanced  Design)  when  crystalline  coordinates  of  the  rigid  analog 
are  unknown.  In  a  first  step,  coding  of  the  molecules  "have  to  consider 
interactions  towards  the  receptor.  One  molecule  is  defined  by  its  interaction 
centers  with  the  hypothetic  receptor  :  classical  atoms  (oxygen,  nitrogen...) 
play  a  fundamental  role  in  the  interaction  and  can  be  defined  as  active 
centers.  Lone  pair  of  an  heteroatom,  direction  of  an  aromatic  group  define  a 
second  class  of  points  on  the  hypothetic  receptor  and  are  named 
"complementary  active  centers".  Choice  of  the  coded  atoms  is  made  by  the 
user  of  the  program.  Example  of  the  second  class  of  coded  points  is  made 
with  the  illustration  of  two  active  centers  representing  a  phenyl  (Fig.l). 

y*  V  P 


|CpVp  =3.4  A 


Fig.  1.  Illustration  of  two  active  centers,  phenyl  center  Cp  and  its  direction 
V  p  representing  a  phenyl  ring  (distance  between  the  two  points  is  3.4  A). 


Pharmacoohor  Search. 


a-  Basis  of  the  method. 


Each  study  includes  one  rigid  molecule  and  an  active  flexible  analog 
(maximum  of  6  degrees  of  freedom).  The  method  is  based  on  the  comparison 
of  two  samples  of  active  centers,  one  being  fixed,  the  other  one  (belonging  to 
the  flexible  analog)  being  reevaluated  for  each  new  investigated 
conformation.  As  each  new  conformation  of  the  flexible  molecule  defines  a 


set  of  active  centers,  the  goal  of  PPSP3  is  to  find  the  common  set  of  points 
between  the  two  molecules.  This  superposition  (maximum  of  superimposed 
active  centers)  is  then  kept  as  a  solution  of  the  program.  The  first  step  of  this 
comparison  is  the  elimination  of  the  origin  of  coordinates.  The 


autoconvolution  product  is  the  conversion  of  two  active  centers  A  and  B 


(origin  0)  into  a  vector  AB  represented  by  a  norm  and  a  direction.  Example  of 
this  mathematic  procedure  applied  to  three  points  A,  B,  C  is  shown  on  Fig.  2. 
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Fig.  2.  Autoconvolution  product  on  A,  B,  C. 

In  a  general  way,  N.(N-l)/2  vectors  are  associated  to  a  set  of  N  active 
centers  (Ai,  A2,....,  An).  As  the  scalar  product  of  AB  with  AC  gives  the 

orientation  (angle  value  (X )  between  the  two  vectors  (Fig.  3),  if  the 
correspondance  of  the  Xn  values  (here  n=3)  between  the  two  sets  of  active 
centers  is  successful,  then  a  pharmacophor  of  3  points  is  found. 


AB.AC 


X  =  AB.AB 
1 

X2=  AB.AC.COSOC 
X3  =  AC.AC 


Fig.  3.  XI,  X2  and  X3  determine  the  comparison  between  2  sets  of  vectors,  OC 
being  the  angle  value  between  AB  and  AC. 

This  method  is  easily  generalized  to  a  number  of  n  points  or  active 
centers  in  order  to  recognize  complex  pharmacophors.  The  program  PPSP3, 
written  in  fortran  77,  is  then  able  to  indicate  superpositions  of  conformations 
for  unlimited  number  of  active  centers. 

b-  Limitation  of  the  solutions. 

Reiteration  of  the  precedent  method  is  done  for  each  new  conformation 
of  the  flexible  analog.  As  small  peptides  could  give  rise  to  a  drastic  number 
of  conformations,  PPSP3  allows  the  user  to  run  magic  numbers  technic  in 
order  to  decrease  the  number  of  investigated  conformations  and  proposed 
pharmacophors.  In  case  of  3  moving  dihedral  angles  with  a  step  of  9°, 
(360/9)3  e.g.  64000  conformations  are  required  instead  of  520 
conformations  by  using  the  magic  numbers.  As  up  to  now,  conformations 
were  generated  in  PPSP3  without  taking  into  account  energetic  values,  it  is 
reliable  to  say  that  the  conformation  fitted  to  the  receptor  is  not  far  from  a 
local  minimum  of  energy  and  for  this  reason,  the  method  cannot  ignore  the 
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conformational  energy  of  a  molecule.  The  steric  energy  is  calculated  by  many 
molecular  mechanics  programs  such  as  MM2  from  Allinger  or  other 
molecular  modeling  softwares  such  as  MAD  (Molecular  Advanced  Design). 
Once  bond  lengths  and  angles  are  well  established,  the  Van  der  Waals 
interactions  (interactions  between  non-bonded  atoms)  remain  the  quickest 
way  for  an  energetic  approach.  The  Van  der  Waals  energy  is  then  a  good 
criterion  for  elimination  of  high  energy  conformations.  The  function  used  for 
Van  der  Waals  calculation  is  the  well  known  6-exponential  function  of 
Allinger's  programs. 

APPLICATION 

PPSP3  is  applied  to  a  simplified  model  including  the  tyrosine  of  the 
enkephalines.  The  dihedral  angle  values  %]  and  ^2  will  determine  the 
conformation  of  the  tyramine  moiety.  As  the  correspondance  morphine- 
tyrosine  of  enkephalines  has  appeared  to  be  obvious  since  the  discovery  of 
these  pentapeptides,  the  direct  superposition  of  the  two  moieties  should  be 
one  of  the  PPSP3  solution.  The  morphine  is  the  basic  model  of  analgesic 
compounds.  It  includes  a  classical  pharmacophor  represented  by  the 
phenylpiperidine  group  (T  shape).  The  structure  is  rigid  and  the  crystalline 
coordinates  are  well  established. 
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Enkephalines  are  known  to  be  8  ligands  and  to  induce  typical  morphinic 
biological  effects  by  adopting  a  different  conformation  when  binding  the  JI 
receptor.  The  sequence  of  the  Leu-enkephaline  is  Tyr-Gly-Gly-Phe-Leu.  ; 

I 

a-  Coding  of  the  molecules. 

i 

The  morphine,  considered  as  the  rigid  molecule  is  coded  with  5  active 
centers  (nitrogen,  phenol  hydroxyle,  center  and  direction  of  the  phenol). 

Other  carbons  of  the  phenethyl  group  are  also  coded.  Such  a  definition  will  ; 
show  clearly  the  superposition  of  the  7  active  centers.  By  the  same  way,  the  j 
tyrosine  is  coded  with  the  same  active  centers  and  includes  3  moving  i 
dihedral  angles  (Fig. 4).  j 
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Lone 


Lone 


Fig.  4.  Coding  of  the  two  molecules. 


H-PPSP3  pharmacophor  search. 

The  study  has  been  done  with  a  step  of  9°  for  the  3  dihedral  angles  of 
the  tyrosine  and  by  using  the  technic  of  the  magic  numbers.  From  this 
search,  2  solutions  corresponding  to  2  different  pharmacophors  appear 
clearly  : 

-  Pharmacophor  1_  or  direct  superposition  :  this  is  the  obvious  superposition 
with  the  correspondance  of  the  7  active  centers  (nitrogen  lone  pairs  noted 
Lone,  center  and  direction  of  the  aromatic  rings  noted  Cp  and  Vp). 


-  Pharmacophor  2  :  this  new  conformation  shows  the  superposition  of  the 
same  active  centers.  Except  the  nitrogens  and  the  phenethyl  carbons,  all  the 
points  are  superimposed  and  the  directions  of  the  lone  pairs  point  to  a 
common  area.  A  classical  superposition  similar  to  this  case  deals  with  the 
QSAR  study  of  the  two  isomers  of  4-phenylpiperidine  (Fig.5). 


Fig.  5.  Superposition  of  the  2  isomers  of  4-phenylpiperidine  both  inducing 
analgesia  at  the  morphinic  receptor. 
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In  this  kind  of  superposition,  the  basis  of  the  action  of  enkephalines 
towards  mu  receptor  is  considerably  different  from  the  classical  model.  In 
order  to  check  the  validity  of  each  solution  (pharmacophor  1  or  2),  energetic 
calculations  are  performed  for  each  conformation. 

c-  Conformational  analysis  of  solutions  bv  ECEPP.- 

Conformational  analysis  of  PPSP3  solutions  (pharmacophors  1  and  2)  is 
made  by  empirical  calculations  of  the  energy  (program  ECEPP  for  Empirical 
Conformational  Energy  Program  for  Peptides)(ref.5).  The  comparison  with  a 
crystalline  structure  of  the  Leu-enkephaline  will  provide  more  informations 
to  check  the  previous  solutions.  The  study  has  been  restricted  to  the  first  2 
amino-acids  of  the  Leu-enkephaline,  e.g.  the  sequence  NH3+-Tyr-Gly-COO‘ 
and  by  considering  the  nitrogen  protonated  at  the  pH  of  the  physiological 
medium.  By  fixing  the  tyrosine  with  a  conformation  corresponding  to 
pharmacophor  1  or  pharmacophor  2,  one  can  evaluate  the  validity  of  the 
model  with  a  minimization  of  the  energy  done  with  conformational 
constraints. 

-  Study  of  pharmacophor  1  (Table  1)  :  values  issued  from  ECEPP  are  in  kcal. 
and  correspond  to  the  sequence  of  dihedral  angles  ((p,  \\f,  %i,  %2)  of  the 
direct  superposition.  The  first  serie  of  angles  and  energy  (23.4  kcal.)  comes 
from  the  first  PPSP3  solution.  The  constraints  imposed  by  pharmacophor  1 
(%1>  %2  fixed)  are  then  relaxed  to  obtain  a  stable  conformation  (5.5  kcal) 
next  to  the  initial  sequence  of  pharmacophor  1. 

TABLE  1 


Energy 

V 

<P 

x. 

24.3 

58 

-64 

-108 

14 

5.5 

58 

-64 

-77 

104 

Energy  and  dihedral  angles  for  the  pharmacophor  1  (first  row)  are  displayed 
as  its  corresponding  relaxed  conformation  (second  row). 

The  initial  energy  (24.3  kcal)  is  very  high  compared  to  the  minimized 
value  (5.5  kcal).  The  final  values  of  %i  and  %2  are  quite  different  from  the 
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initial  angle  values  and  the  pharmacophor  1  defines  a  conformation  far  from 
the  next  local  minimum. 

-  Study  of  pharmacophor  2  :  the  initial  conformation  ((p,  -64°,  \|/,  -52°, 

%1,  200°,  %2,  84°)  has  a  lower  steric  energy  of  10.1  kcal.  The  sequence  of 

dihedral  angles  after  relaxation  leads  to  values  (168°),  (%2  65°)  and  a 

very  low  energy  value  of  4.8  kcal.  In  this  case,  the  energy  value  is  lowest 

(-14  kcal.)  than  for  pharmacophor  1.  Dihedral  angles  after  relaxation  are 

close  to  the  initial  values.  The  pharmacophor  2  defines  a  more  stable 

conformation  than  the  one  corresponding  to  the  first  pharmacophor.  For 

pharmacophor  1,  the  angle  %2  is  close  to  0°  (14°  in  the  first  row  of  Table  1)  ; 

as  the  more  stable  state  for  the  tyrosine  in  a  free  rotation  is  the  staggered 

conformation  (relatively  to  the  phenethyl  carbons),  the  pharmacophor  2 

reduces  the  steric  interactions.  On  the  other  hand,  the  conformational 

constraints  of  the  rigid  morphine  induces  a  high  steric  energy  value  for  , 

pharmacophor  1. 

Correlation  is  made  with  the  crystalline  structure  of  Leu-enkephaline  * 

for  pharmacophor  2. 

d-  Superposition  Leu-enkephaline  -morphine. 

The  pentapeptide  Leu-enkephaline  trihydrate  P2i2i2i  has  been 
chosen  as  it  has  been  crystallized  in  water  (similar  to  the  physiological  ' 

medium).  By  comparing  the  PPSP3  solutions  with  this  stable  conformation,  ;i 

the  methodology  adopted  to  check  our  results  can  be  compared  to  the  I 

previous  method  (comparison  with  structures  optimized  by  molecular 
mechanics).  The  calculation  of  the  dihedral  angles  \|/  and  (j)  of  the  different 
amino-acids  gives  the  conformation  of  the  peptide  after  its  crystallization 

(<j>2,  -60°,  \|/2,  -30°,  (j)3,  -50°,  \j/3,  -60°,  ((>4,  -80°,  \|/4,  4°).  The  3  amino-acids  [ 

Tyr-Gly-Gly  induce  an  OC  helice  conformation  followed  by  a  (3  turn  of  type  I  *, 

starting  from  Phe^.  The  superposition  with  the  pentapeptide  is  possible  by 
considering  the  conformation  of  the  second  PPSP3  solution  (pharmacophor  2) 
as  direct  superposition  morphine-Leu-enkephaline  cannot  be  obtained  i 

because  distances  nitrogen-phenyl  center  are  too  different  in  the  2  i 

molecules  :  ; 

-  distance  N  -  Cp  (tyrosine)  =  5.2  A.  { 

-  distance  N  -  Cp  (morphine)  =  4.5  A.  \ 

For  the  tyrosine,  <p  (-70°),  \j/  (125°),  %1  (178°),  %2  (47°)  are  dihedral  \ 

i 

angles  values  close  to  the  ones  obtained  for  pharmacophor  2.  A  consequence  •$ 


* 
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of  these  results  is  that  Leu-enkephaline  and  morphine  adopt  a  conformation 
at  the  mu  receptor  different  from  the  classical  model.  The  phenylalanine 
moiety  of  enkephaline  is  necessary  for  the  interaction  of  the  peptide  towards 
the  delta  site  and  orientation  of  this  aromatic  ring  is  dominant  with  respect 
to  the  selectivity  ;  this  has  been  shown  by  Portoghese  with  the  discovery  of 
new  selective  delta  ligands  (ref.6).  The  hypothesis  that  the  enkephalines 
could  interact  with  the  phenylpiperidine  moiety  of  the  morphine  has  been 
suggested  by  Portoghese.  As  the  hybrid  of  enkephaline,  where  the  tyrosine 
has  been  replaced  by  (-)  metazocine  (ref.7),  is  inactive,  Portoghese  concludes 
that  the  tyramine  moiety  in  the  opiates  does  not  play  the  same  functionnal 
role  than  for  peptides.  In  fact  the  inactivity  of  these  compounds  could 
suggest  that  only  the  orientation  of  the  phenylpiperidine  ring  is  crucial  for 
the  activity  at  the  mu  receptor.  Such  a  peptide  conformation  described  for 
the  pharmacophor  2  is  then  essential  for  an  interaction  at  the  mu  site.  The 
conclusion  is  that  tyramine  moieties  between  morphinic  compounds  and 
enkephaline  peptides  could  play  the  same  role  by  adopting  different  j 

conformations  (pharmacophor  1  for  the  morphine,  pharmacophor  2  for  Leu- 
enkephaline). 

1 

CONCLUSION  j 

The  method  of  shape  recognition  used  in  PPSP3  is  useful  for  the  | 

determination  of  common  and  complex  pharmacophors  The  program  has 
been  applied  to  the  classical  morphine-enkephaline  comparison,  assuming 
that  the  peptide  also  binds  to  the  mu  site.  By  considering  pure  geometrical 
criteria,  PPSP3  search  has  given  2  hypothesis  of  pharmacophors.  The  direct  ! 
superposition  of  the  2  structures  has  shown  a  high  steric  energy,  ECEPP  ! 
program  providing  all  the  energetic  informations  of  the  studied 
conformations.  Comparison  with  the  crystalline  structure  of  Leu-enkephaline 
trihydrate  has  also  been  performed.  If  the  first  applications  of  the  PPSP3 
pharmacophor  search  have  concerned  the  morphine/enkephaline  model,  the 
described  method  could  certainly  be  applied  to  another  series  of  drugs. 
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INTRODUCTION 

Structure  activity  relationships  in  a  series  of  mercaptotripeptides 
HS-CH2-CH2-CO-Pro-X  X=  Ala,  Nle,  Leu, 
which  are  inhibitors  of  bacterial  collagenase,  have  shown  that  the  addition  of  a  methyl 
group  onto  the  amide  group  of  X,  enhanced  their  inhibitory  activity  ( for  example  when 
X=Ala  the  Kl  value  increases  from  60  1 0’8  M  to  17  10‘8  M  ).  One  effect  of  the  methyl 
group  is  to  restrict  the  conformational  space  of  the  inhibitor.  In  order  to  get  a  deeper 
insight  into  the  conformational  perturbation  due  to  the  N-methylation,  the  conformational 
properties  of  the  inhibitor  in  which  X  =  N-Me-Ala  (figurel),  were  studied  by  carrying  out 
molecular  dynamics  simulations,  and  energy  minimizations. 

COMPUTATIONAL  PROCEDURES 

We  present  here  the  different  techniques  used  in  this  study: 

Conformational  search 

Molecular  dynamics  simulations  were  performed  at  600K  in  order  to  efficiently 
explore  the  conformational  space  of  the  molecule.  The  initial  conformations  were  energy 
minimized  by  200  steps  of  adopted  basis  Newton-Raphson  and  used  as  starting  point 
for  the  molecular  dynamic  simulations.  The  temperature  of  600K  was  fixed  by  increasing 
gradually  the  kinetic  energy  during  2.4  ps.  It  was  followed  by  an  equilibration  stage 
lasting  3  ps.The  final  trajectories  used  for  the  analysis  were  realized  for  a  period  of  140 
ps.  The  results  reported  here  were  based  on  coordinates  averaged  over  Ips  time 
periods. 

Contour  map 

The  adiabatic  energy  contour  map  was  built  by  constraining  the  two  dihedral  angles 
of  interest  around  the  grid  point,  by  harmonic  potentials.  The  conformer  defined  at  each 
grid  point,  was  energy  minimized  by  using  200  steps  of  the  adopted  basis  Newton- 
Raphson  method  followed  by  5  steps  of  Newton-Raphson  method. 


'  * 


The  thermodynamics  integration  method  0)  was  used  to  calculate  the  free  energy 
difference  between  two  conformers  1  and  2  defined  by  a  single  dihedral  angle  coo-  A 
driving  potential  was  included  in  the  potential  energy  in  order  to  move  this  dihedral 
angle  from  its  initial  value. coj  to  its  final  value  coj  (3).  The  driving  potential  was 
\fx=  K(co-  cdq)2  (  K=10kcal/mol )  with  ©o  =  (1-  X)  coj  +  Xcoj 
where  the  reaction  coordinate  Ovaries  from  0  to  1.  The  free  energy  difference  between 
the  two  conformers  was  expressed  by 


AAi->2 


.  f  <  ^ 

Sr\  S  k 


>x  dx 


The  quantities  <  >x  were  calculated  for  discrete  values  of  X  in  a  series  of 
dynamics  carried  out  at  300K.  The  population  ratio  for  the  two  conformers  was  estimated 
by  the  formulae: 

AA^j.2 


=  e 

pi 

The  program  which  we  developed  for  the  free  energy  difference  calculation  was 
included  into  the  CHARMM  program  (3). 


The  model  considered  includes  the  solute  surrounded  by  367  water  molecules. 
Conventional  periodic  boundary  conditions  were  used.  For  each  X  a  trajectory  of  15ps 
was  realized,  and  only  its  last  4ps  were  used  for  the  free  energy  estimation. 

The  calculations  were  performed  using  the  program  CHARMM(3).  The  extended 
atom  model  for  incorporating  the  hydrogens  into  the  heavy-atoms  to  which  they  are 
bonded  was  used  only  for  aliphatic  hydrogen  atoms.  The  charges  used  are  displayed 
on  figure  1.  The  electrostatic  energy  between  two  atoms  is  calculated  with  a  dielectric 
constant  varying  with  the  distance  between  these  atoms  in  the  vacuum  calculations  ;  a 
dielectric  constant  equal  to  unity  was  used  when  water  molecules  were  included  into  the 
model.  These  calculations  were  carried  out  on  a  microVAX  and  on  the  CRAY  XMP  at 
CEN  Saclay. 


RESULTS  AND  DISCUSSION 

One  of  the  major  degrees  of  conformational  flexibility  of  the  molecule  HS-CH2-CH2- 
CO-Pro-Ala  comes  from  the  internal  rotation  around  the  Ca-CO  bond  of  proline, 
(dihedral  angle  \j/2  in  figure  1). 

It  has  been  shown  previously  that  in  the  case  of  a  proline  residue,  the  \y2  angle 
adopts  mainly  the  three  values  :  \{/2  =  -60°,  60°,  and  150°  corresponding  respectively  to 
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the  classical  most  stable  conformations  CIS',  C7  and  TRANS'.  Because  the  presence  of 
a  methyl  group  prevents  the;  formation  of  the  C7  conformer,  the  molecular  dynamics 
simulations  were  performed  starting  only  from  TRANS'  and  CIS'  conformers. 
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Figure  1  :  N-Methyl  Inhibitor  bacterial  collagenase  (charges) 

Figure  2a  shows  the  values  of  the  three  dihedral  angles  psi2,  phi3  ,  co2  as  a  function 
of  time  in  the  trajectory  started  from  the  TRANS'  conformer.  In  this  case  no 
conformational  change  is  observed  during  the  time  course  of  140ps.  In  the  contrary, 
figure  2b  which  corresponds  to  the  trajectory  started  from  the  CIS'  conformers  shows  a 
conformational  change  after  a  period  of  40ps.  The  transition  observed  for  \j/2  (CIS1  to 
TRANS')  is  coupled  with  that  observed  for  w2  ;  the  last  one  corresponds  to  a  transition 
from  TRANS  to  CIS  of  the  N-methylated  peptide  bond  (co2  =  180°  to  a>2=0°) 

This  study  was  followed  by  the  investigation  of  the  energy  minima  corresponding  to 
the  TRANS'/TRANS  and  TRANS'/CIS  conformers  which  were  determined  by  the 
molecular  dynamics  simulation.  Figure  3  shows  the  adibatic  contour  map  (\|/2  versus  co2) 
obtained  by  the  procedure  defined  previously.  The  two  conformations  characterized  by 
(\j/2=150°,  0)2=180°)  and  {\jr2=150°,  w2=0°)  correspond  to  two  equal  energy  minima, 
whereas  the  CIS’/TRANS  conformer  corresponds  to  high  energy  minimum  (18kcal/mol). 
These  results  may  explain  the  fact  that  the  molecular  dynamics  simulation  started  from 
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the  CIS'/TRANS  conformer  converged  to  a  conformer  of  lower  energy,  here  TRANS'/CIS 
(figure  2b). 

The  conformers  TRANS'/CIS  and  TRANS'/TRANS  corresponding  to  the  two  lowest 
energy  minima  are  the  only  one  observed  in  the  NMR  experiments  performed  recently  in 
pur  laboratory  on  the  same  compound.  Moreover,  these  NMR  experiments  indicate  that 
the  populations  ratio  is  95%  for  the  TRANS'/TRANS  conformational  state  and  5%  for  the 
TRANS'/CIS  one.  This  result  is  in  accordance  with  the  larger  surface  area  observed  for 
the  TRANS'/TRANS  conformer  in  comparison  with  that  of  TRANS'/CIS  conformers  in 
figure  2.  In  order  to  evaluate  theoretically  the  population  ratio  between  these  2 
conformational  states  we  performed  free  energy  difference  calculations. 


AA  Kcal/mol 


TRANS7TRANS  - - »  TRANS7CIS 


Figure  4  :  Free  energy  for  the  transition  TRANS’/TRANS  to  TRANS’/CIS 


Figure  4  shows  the  variation  of  the  free  energy  difference  as  a  function  of  X  when 
going  from  TRANS’/TRANS  to  TRANS'/CIS  conformation.  The  free  energy  difference 
between  the  two  conformers  is  about  2  kcal/mol  in  favour  of  TRANS'/TRANS  with  a 
corresponding  population  of  96%.  Although  this  result  agrees  well  with  the  value  given 
by  the  NMR  results,  longer  simulations  are  needed  to  give  an  error  estimation. 
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CONCLUSIONS 

The  calculations  reported  here  show  how  molecular  dynamics  calculation  are 
efficient  to  localize  low  energy  minima.  The  consideration  of  the  entropic  factor  appears 
here  essential  in  order  to  explain  the  percentage  of. the  different  conformers  observed 
experimentally  for  the  N-methyl  compound. 

From  the  biological  point  of  view,  it  appears  that  the  stabilization  of  the 
TRANS'/TRANS  conformation  can  explain  the  stronger  inhibitor  activity  of  the  N-methyl 
compounds  in  comparison  with  that  of  the  non  methylated  ones. 

Considering  the  fact  that  a  fast  kinetics  is  observed  for  the  inhibitory  activity  of  this 
compound  and  dominance  of  this  conformation  in  solution,  and  the  large  energy  barrier 
around  its  energy  minimum,  it  can  be  postulated  that  the  TRANS'  conformation 
corresponds  to  that  of  the  bound  state. 
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Introduction 


Seeds  of  Ecballium  Elaterium 

_ 1  (jumping  cucumber)  are  a 

I  I  rich  source  of  protein  inhibi- 

•  I  tors  of  proteinases  :  we  re- 

GCPRILMRCKQDSDCLAGCVCGPNGFCG  cently  isolated,  sequenced, 

and  characterized  the  main 

_  trypsin  inhibitor  [1],  which, 

following  the  international 
Figure  1  :  The  primary  structure  of  EETI  II  nomenclature  [2],  is  designed 

as  EETI  II. 

Its  primary  structure  indicates  that  it  belongs  to  the  squash  family  [3],  a  recently  discov¬ 
ered  family  of  serine  protease  inhibitors  [4]  of  small  peptides  rich  in  disulfide  bridges. 
The  chemically  synthesized  EETI  II  [5}  is  the  shortest  microprotein  inhibiting  a  serine 
protease  so  far  known,  with  an  dissociation  constant  of  1.2  10-1 1  M.  It  contains  28  ami¬ 
no  acid  residues  and  three  disulfide  bridges  ;  the  primary  structure  is  outlined  on  Fig¬ 
ure  1. 

Solution  structure  by  NMR 

Though  the  primary  structure  is  closely  related  to  other  microproteins  extracted  from 
various  cucurbitacaea,  its  chemical  synthesis  was  facilitated  by  an  unique  feature  of  the 
C-terminal  sequence,  where  a  -Gly-22-Pro-23-Asn-24-Gly-25-  p-tum  forming  section  be¬ 
haves  as  a  strong  template  to  drive  the  correct  folding  in  high  yield  and  selectivity. 
Large  quantities  of  EETI  II  could  be  prepared,  enough  to  make  a  complete  2D  NMR 
study  [6] ;  this  study  allowed  to  depict  the  different  secondary  structure  elements  of 
EETI  II ;  however,  the  access  to  the  disulfide  bonds  connectivity  remained  unclear.  In¬ 
deed  this  assignment  could  not  proceed  even  on  the  simple  examination  of  the  nOe's 
displayed  between  the  cystein  residues  ;  Cys2  was  completely  lacking  of  long  range 
nOe's,  Cys21  had  two  nOe's  with  both  Cys9  and  Cys2 7  :  on  the  other  hand  Cysl5  ex¬ 
changed  a  nOe  with  Cys  19  (see  Figure  2). 

Hence  a  DISGEO  modelisation  of  all  15  possible  disulfide  bridges  combinations  made 


■  l 

i 


Figure  2  :  nOe’s  involving  cysteines  in 
EETIII 

SI  +130 


possible  the  assignment  of  the  disulfide 
bridges  :  4  structures  were  characterized 
by  a  minimal  number  of  distance  viola¬ 
tions  ;  in  these  structures  it  appeared 
that  the  main  difference  lied  in  the  con¬ 
formation  of  the  Glyl-Arg8  strand,  gov¬ 
erned  by  the  connnexion  imposed  to 
Cys2.  These  structures  are  named  from 
SI  to  S4  with  increasing  number  of  dis¬ 
tance  violations  and  outlined  on  Figure 
3  together  with  the  values  of  the  torsion 
angles  of  the  disulfide  bridges ;  the  rejec¬ 
tion  of  structures  S3  and  S4  was  dictated 
by  the  unacceptable  value  of  the  dihe¬ 
dral  angle  of  the  Cys21-Cys27  bridge  ; 
structure  S2  was  rejected  because  of  the 
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lack  of  interlacing  of  the  disulfide  bridges,  unable  to  justify  the  apparent  rigidity  of  the 
molecule. 

A  very  striking  characteristics  of  this 
15  molecule  lies  in  the  fact  that  the  Cysl5- 

P  Cys27  bridge  penetrates  the  macrocycle 

/  R4  formed  by  the  two  other  disulfide 

r  -q  9  bridges  and  the  strands  linking  them. 

C9  CX  l  This  fact  is  illustrated  on  Figure  4  . 

Q  EETI  /  trypsin  crystal  structure. 

'yf  The  radiocristallographic  studies  of  the 
T  qxj  9  X  ^  Parent  Cucurbita  maxima  /  bovine 

6  4-^.q  trypsin  inhibitor  (CMTI I) [7]  and  by  our 

\  ^  |  @  own  crystal  structure  of  the  EETI  II  / 

V/  C21sVq--^,-U,*\sP  porcine  pepsin  structure  [8]  have  been 

C19  made  recently. 

Crystal  structures  confirm  unambigu¬ 
ously  our  NMR  assignments  of  the 
Figure  4  :  DISGEO  model  of  EETI  II ;  Ca  chain  disulfide  bridges.  It  appears  that  the 
and  disulfide  bridges.  sessile  Arg4-Ile5  bond  is  not  cleaved  in 

the  crystal  complex,  though  the  carbonyl  bond  of  Arg  4  lies  in  the  vicinity  (2.4  A)  of  the 
distal  nitrogen  atom  of  His42  of  the  enzyme.  The  interaction  of  the  inhibitor  with  the 
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\,  enzyme  involves  the  strand 

y<— -\  Ai6  K15  //\/^"//  running  from  Gly.l  to  Leu6 

/  A.  L6  — - y\ — A  /  occupying  the  sites  from  P2 

J/T'7 \  to  P'4-  Strikingly,  Leu6 

//  is  A  nX  '  shares  the  interaction  in  P'2 

iWj  P3  |  \  (Tyr341)  with  Phe26.  Anoth- 

yy  M7  \  er  interesting  feature  lies  in 

,//  \  the  close  similarity  between 

/  the  inhibiting  loop  of  EETI II 

R8  C2L_vf  and  the  corresponding 

/  strand  of  BPTI.  Ca  from  3  to  8 

/  of  EETI  II  can  be  fitted  with  C 

a  from  14  to  19  of  BPTI,  in 

Figure'5  :  superimposition  of  the  inhibiting  loops  of  fP’te  similarity 

EETI  II  (black  line,  roman)  and  BPTI- (gray  line  italic)  *n  sequences  as  shown 

on  Figure  5.  This  result  sup¬ 
ports  the  hypothesis  of  the 
existence  of  a  variable  sequence,  fixed  conformation  loop  in  a  large  superfamily  of  se¬ 
rine  proteinase  inhibitors  [9]. 

EETI  II  as  a  vehicle  for  new  functions. 

We  have  found  that  EETI  II  was  a  unique  carrier  for  new  inhibiting  functions  ;  active 
site  modifications  can  produce  inhibiting  activities  orientated  towards  other  serine 
proteinases  ;  furthermore,  we  succeded  in  grafting  a  second  active  site  inhibiting  car- 
boxypeptidase  A. 

Elastase  inhibitors. 

Elastases  are  serine  proteinases  homologous  to  trypsins  ;  the  biological  role  of  human 
leucocyte  elastase  in  the  microbial  lysis  by  polymorphonuclears  has  been  clearedup. 
However  this  powerful  and  poorly  selective  proteinase  is  normally  completely  inhib¬ 
ited  by  the  al-antitrypsin  (a  misnamed  circulating  protein  that  should  be  called  anti- 
elastase,  synthesized  in  the  hepatocyte).  Defective  secretion  of  al-antitrypsin  results  in 
a  pathogeneous  activity  of  elastase,  especially  directed  against  lung  elastin,  a  constitu¬ 
tive  protein  of  lung  tissues  ;  those  people  presenting  such  disease  generally  suffer  se¬ 
vere  chronic  emphysema.  It  is  well  known  that  the  substrate  specificity  of  elastases  dif¬ 
fers  from  that  of  trypsin  in  the  nature  of  the  residue  placed  in  SI  recognition  subsite. 
Trypsin  prefers  a  basic,  positively  charged  residue  such  a  lysine  or  an  arginine,  porcin 
pancreatic  elastase  prefers  a  neutral  small  hydrophobic  alanine,  whereas  human  leu¬ 
cocyte  elastase  best  substrates  bear  an  hydrophobic,  bulky  valine.  The  simple  substitu¬ 
tion  of  Arg4  of  EETI  II  by  alanine  and  valine  induced  both  the  disappearance  of  the 


Trypsin 

EETI  n  [1-28]  1.10-12M 

Ala-4 

Val-4,  Nle-7 
Phe-4,  Nle-7 


PPE** 


1.4  10‘8  M  2.5  10'8  M 
2.2  10'9  M  5  10'6  M 


a-Chymotrypsin 


2 10-7  M 


Table  :  Dissociation  constants  of  EETI  analogues  towards  three  serine  proteinases 
(*  human  leucocyte  elastase  ;  **porcine  pancreatic  elastase). 
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anti-trypsin  activity  and  the  promotion  of  a  very  strong  inhibition  of  pancreatic  and 
leucocyte  elastase.  Similarly  the  replacement  of  Arg4  by  a  phenylalanine-promotes  an 
anti-chymotrypsin  activity  as  shown  in  the  Table  [10]. 

A  hew  family  :  the  knotted  proteins. 

The  striking -knotted  structure  where  the  disulfide  bridges  between  the  third  and  the 
sixth  cysteines  crosses  the  macrocycle  formed  by  the  first,  second,  forth  and  fifth  cys¬ 
teines  was  also  found  in  CPI,  a  carboxypeptidase  inhibitor  found  in  potato  leaves  [11] . 
Conotoxin-co  [12]  appeared  recently  as  a  member  of  this  new  family  [13].  These  three 
microproteins  share  the  same  6  cysteines  network  [I-IV,  II- V,  III- VI]  with  very  few  se¬ 
quence  homology,  and  constitute  a  new  structural  family  of  proteins  sharing  the  same 


Figure  6  :  the  knottins  ;  left :  EETI II ;  middle  :  CPI ;  right :  conotoxin  to 

topology,  the  "knottins"  (Figure  6)  ;  the  cysteine  interlocked  connectivity  is  necessary 
but  not  enough  to  insure  the  belonging  to  this  family  ;  indeed,  each  of  the  four  do¬ 
mains  of  wheat  germ  agglutinin  have  such  a  connectivity  but  with  a  planar  spiral  to¬ 
pology  [14].  The  recently  studied  E.  coli  enterotoxin  [15]  appeared  to  be  in  the  same  case 
of  an  interlocked  but  topologically  planar  molecule  [13].  It  can  be  seen  on  Figure  7  that 
very  few,  if  no  sequence  homology  is  shared  by  these  three  molecules. 

The  folding  mechanism  of  such  molecules  is  especially  intriguing  and  EETI  II  shall 
certainly  prove  to  be  an  excellent  model  for  such  a  study. 


<EQHADPI 

Gi 


PRILMR 

NKP - 

PRILMR 


iyKSXGSSlySXTSYN] 


KQ-DSD 

KTHD-D 

KQ-DSD 


|l-ag-[c 

:  SGAWF  C 
:  L-AV- \C 

' - S£ 


V-CGPNGF- 
QACW-NSART| 
V-  CGPNGF — | 
RSCNXYTKR-1 


G 

GPYVG 

GPYVG 

[$H-nh2 


EETI  II 
CPI 
TCP  I* 

conotoxin  co 


Figure  7  :  sequence  alignment  of  the  knottins.  (*see  below) 


TCP/,  a  chimeric  microprotein  inhibitor  of  both  trypsin  and  carboxypeptidase  A 

These  considerations  prompted  us  to  synthesize  a  "chimeric"  peptide  with  the  se¬ 
quence  of  EETI  II  plus  the  C-terminal  tetrapeptide  of  CPI  in  order  to  check  to  possibility 
of  building  a  double  headed  inhibitor  directed  towards  two  enzymes  vastly  different  in 
their  active  sites. 

Figure  7  shows  the  alignment  of  the  sequences  of  EETI  II  and  CPI,  fixed  on  the  six  cys- 
tein  residues.  Apart  the  cysteines,  only  six  residues  are  shared  by  both  inhibitors.  The 
C-terminal  inhibiting  tail  of  the  CPI,  with  Tyr,  Val  and  Gly  in  position  P2,  PI  and  P'l 
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of  carboxypeptidase  is  .completely  Tacking  in  EETI II  ;,on  the  other  hand,  the  inhibiting 
loop  of  the  trypsin  inhibitor,  Pro-Arg-Ile-Leu-Met-Arg  is  shortened  to  three  residues 
in  the  homologous,  segment  of  GPI. 

Fitting  the  coordinates  of  the  six  cysteines  of  each  inhibitor/enzyme  complex  on  a 
computer  graphic  screen  shows  that  it  would  be  conceivable  to  make  an  inhibitor 
binding  both- enzymes,  when  the  two  binding  sites  are  present  -:  indeed,  as  these  bind¬ 
ing  sites  are  apart  from  the  general  plane  of  the  molecule,  the  two  enzymes  can  stay 
without  any  major  steric  interaction  around  a  tentative  model  of  the  extented  EETI  H 
molecule  .(TPCI)  as  seen  in  Figured. 

Indeed,  the  extented  EETI  II  showssloechiometric  interaction  and  good  inhibitory  po¬ 
tency  (Trypsin  :  Kd  =  1.8  10-9  M  ;  CPA  :  Kd  =  3  10-9  M  )  towards  its  target  enzymes 
measured  separately.  However,  the  inhibitory  constant  towards  trypsin  is  significantly 


Trypsin 


Figure  8  :  Predictive  model  of  the  CPA  /  TCPI  /  trypsin  complex 


lower  than  in  the  original  EETI  II  ;  this  loss  is  attributed  to  the  extented  C-terminal 
which  may  impair  the  ionic  interaction  of  Gly-28  with  the  N-terminus[7],  By  contrast, 
the  inhibition  constant  towards  CPA  is  similar  if  not  identical  to  that  of  the  original 
potato  inhibitor,  thus  showing  both  the  small  influence  on  inhibitory  activity  of  the 
primary  sequence  of  the  N-terminal  part  of  this  molecule  and  the  importance  of  the 
overall  topology  due  to  the  arrangement  of  the  disulfide  bridges. 
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Figure  9  :  time  course  of  the  CPA  /  Figure  10  :  molecular  sieve  elution  pro- 

TCPI  /  trypsin  complex  formation  file  of  the  trimolecular  complex 
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In-order  to  ascertain whether  the  bis-headed- peptide  was  able  to  react  concomitantly 
vvith  both  enzymes  ;  the  1/1  insolubleCPA/peptide  complex  was  solubilized  in  1  M 
NaCl  and  allowed  to  interact  with  trypsin  in  less  than-stoechiometric  quantity. 

Both  activities  were  reduced  as  shown  in.Figure  9  and  the  molecular  sieve  chromatog¬ 
raphy  of  the  reaction  mixture  showed  a  peak  in:the  molecular  weight  range  of  62000, 
which  could  be  the  tri-component  complex  (  Figure  10  ;  M.W.  =  61860  for  trypsin/ 
peptide/CPA)  [16]. 

Nothing  could  have  allowed  such  a  synthesis  beside  molecular  modelling.  Neither  the 
origin  of  the  two  peptides,  (one  from  the  Cucurbitacaea,  the  other  from  the  Solana- 
caea),nor  their  function,  (  one  inhibits  a  serine-proteasethe  other  a  zinc  protease),  nor 
the  sequence  (beside  the  cysteine  positions,  very  little  similarity  is  shown).  Only  the 
knowledge  of  the  3D  structure  with  the  knotted  disposition  of.  the  disulfide  bridges  en¬ 
lightened  the  similarity  and  suggested  the  graft  of  an  added  active  site  to  an  already  ac¬ 
tive,  albeit  different  peptide. 
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SUMMARY 

On  the  example  of  the  system  H30t(H20)„  ,  the  effect  of  the  energy  parameters 
on  Monte  Carlo  clustering  energies  of  small  clusters  is  studied.  Special  attention  is 
paid  to  three-body  contribution.  Quantum  mechanical  calculations  (ab  initio  SCF  and 
dispersion  energies)  are  performed  on  some  specific  geometrical  structures  of  cluster 
H30+(H20)g  ,  allowing  to  study  three  and  higher-body  effect  on  the  first  solvation  shell. 
Calculations  are  also  done  with  some  of  the  most  known  expressions  and  parameters 
available  in  the  literature. 

INTRODUCTION 

Monte  Carlo  and  Molecular  Dynamics  calculations  on  systems  bonded 
through  intermolecular  interactions  are  generally  based  on  the  use  of  analytical  ex¬ 
pressions  describing  the  interaction  between  two,  or  eventually  three,  molecules.  In 
most  cases,  the  energy  of  the  total  system  is  computed  as  a  sum  of  intermolecular 
energies  between  pairs  of  molecules,  within  the  "pair  approximation",  generally 
assuming  pairwise  additivity.  Corresponding  parameters  are  needed.  They  may  be 
adjusted  either  to  reproduce  experimental  data  or  to  fit  previous  calculations  of 
energies  and  properties  of  specific  geometrical  configurations  of  the  bimolecular 
systems.  In  the  first  case,  the  data  available  generally  involve  three  and  higher-body 
contributions,  which  leads  to  introduce  some  artefact  in  the  determination  of  the 
parameters;  we  then  have  "effective"  potentials  :  it  is  assumed  that  three  and  higher- 
body  terms  are  partly  taken  into  account  through  two-body  expressions. 

Many  expressions  and  parameters  have  been  proposed  for  the  water  system 
(refs.  1-7).  Comparisons  are  available,  but  they  generally  concern  large  systems, 
considering  averaged  values,  and  it  is  difficult  to  really  know  their  validity  in  the  case 
of  specific  geometries.  In  the  present  work,  quantum  mechanical  calculations  are 
presented  for  four  geometries  of  clusters  involving  six  water  molecules.  These 
energies  are  compared  with  those  obtained  from  some  of  the  most  known  expressions 
and  parameters  available  in  the  literature. 

Some  parameters  describing  the  interaction  between  H30+  and  a  water 
molecule  have  been  proposed  previously  (refs.  8-11).  In  the  present  work,  a 
comparison  is  done  for  some  specific  geometries,  considering  ab  initio  SCF  and  dis¬ 
persion  energy  calculations  together  with  some  expressions  available  in  the  literature. 
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Three-body  forces  have  also  deserved  special  attention.  Their  effect  on  Monte 
Carlo  agregation  energies  (ref.  11)  and  in  quantum  mechanical  calculations  performed 
on  some  of  the  most  stable  configurations  of  small  clusters  (ref.  12)  have  been  shown 
previously.  In  the  present  work,  this  effect  is  also  studied  from  quantum  mechanical 
calculations  performed  on  some  specific  geometries  of  clusters  involving  6  water 
molecules.  This  gives  interesting  information  on  the  nature  of  the  filling  of  the  first 
solvation  shell.  As  will  be  seen,  the  neglect  of  three-body  contributions  can  artificially 
favor  some  geometries. 

DETAIL  OF  THE  CALCULATIONS 

When  Monte  Carlo  results  are  mentioned,  we  refer  to  calculations  based 
(refs.  13-14)  on  the  use  of  an  atom-atom  (12-1-6)  expression  to  describe  the  interaction 
between  the  ion  and  each  water  molecule,  the  water-water  interaction  being  evaluated 
from  MCY  potentials  (ref.  4).  The  effect  of  three-body  forces,  described  by  an  atom- 
atom  expression,  has  also  been  investigated  (refs.  11-12).  Two  geometries  of  the 
oxonium  ion  have  been  considered  :  a  planar  one,  PL,  with  d(OH)  =  0.965  A  and 
0  =  120°  (ref.  9),  and  a  slightly  pyramidal  one,  PYR,  with  d(OH)  =  0.959  A  and 
0  =  113.5°  (ref.  15).  Two  sets  of  parameters  have  been  used  for  the  ion-water  inter¬ 
action,  denoted  by  (12-1)(6)  and  (12-1-6),  respectively  :  in  the  first  case,  the  parameters 
have  been  adjusted  to  represent  separately  the  SCF  and  dispersion  energies  computed 
for  some  geometrical  configurations  of  the  bimolecular  system;  in  the  second  case,  the 
parameters  are  adjusted  to  reproduce  the  total  energies  (these  procedures  are 
commented  in  refs.  11  and  16). 

One  of  the  important  feature  of  Monte  Carlo  calculations  based  on  the  pair 
approximation  is  the  possibility  to  find,  among  the  most  stable  configurations 
generated,  either  3  or  4  water  molecules  in  the  first  solvation  shell  when  the  total 
number  of  water  molecules  in  the  cluster  is  at  least  6.  Special  attention  is  then  paid  to 
such  configurations.  SCF  ab  initio  calculations  have  been  performed  (using  a 
vectorized  version  of  program  ASTERIX  developed  in  Strasbourg)  with  the  two 
geometries  of  the  ion  and  structures  involving  3  or  4  water  molecules  in  the  first  shell 
of  clusters  H3Ot(HjO)6.  These  structures  are  denoted  by  (3+3)PL,  (4+2JPL, 
(3+3)PYR,  (4+2)PYR,  the  notation  (4  +  2)  meaning  that  4  water  molecules  are  in  the 
first  shell  and  the  other  2  in  outer  shells.  DZP  basis  sets  are  used  (ref.  9),  and  the  SCF 
intermolecular  values  are  corrected  for  the  basis  set  superposition  error  (BSSE),  using 
the  counterpoise  correction  (ref.  17).  In  order  to  evaluate  the  three  and  higher-body 
contribution  effect,  TB,  SCF  calculations  have  been  performed  on  the  total 
supermolecule  system  (SCF(SM)),  and  for  all  pairs  of  molecules  (SCF(bi)).  The  disper¬ 
sion  energy  is  derived  from  our  previous  approximation  EK  (ref.  9),  and  from  the  ex¬ 
pansion  RW  proposed  by  Wormer  et  al  (ref.  18)  on  the  basis  of  accurate  ab  initio 
quantum  mechanical  calculations  in  the  perturbation  scheme  and  the  multipole  expan¬ 
sion.  More  details  on  this  part  of  the  work  are  given  in  ref.  19. 

Calculations  have  also  been  performed  on  the  4  structures  described  above, 
using  some  expressions  and  parameters  available  in  the  literature.  They  are  denoted 
by  EK,  GGB,  BB  in  the  case  of  ion-water  interaction,  corresponding  to  references 
(9,10,8),  respectively.  For  the  widely  studied  water-water  system,  we  generally  refer  to 
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the  usual  notation  when  available  and  to  the  initials  of  the  authors  otherwise. 
Parameters  MCY  (ref.  4),  AFHF  (ref.  3),  SPC  (ref.  5),  BF  (ref.  1),  TIP3P  (ref.  7),  TIPS2 
(ref.  7),  TIP4P  (ref.  7),  ST2  (ref.  2),  CH  (ref.  6),  and  RW  (ref.  18)  have  been  considered. 
When  the  dispersion  energy  is  taken  from  different  references  for  ion-water  and 
water-water  pairs,  we  use  the  notation  displW(XX),or  dispWW(YY),  XX  or  YY  refering  to 
the  expressions  and  parameters  described  above.  Let  us  note  that  RW  parameters  are 
directly  computed  from  quantum  mechanical  ab.  initio  calculations;  EK,  MCY,  AFHF  and 
CH  parameters  are  adjusted  to  reproduce  ab  initio  calculations  of  surfaces,  while 
experimental  data. are  taken  into  account  in  the  other  cases,  introducing  semi-empirical 
considerations. 

THE  RESULTS 

Three-body  forces  effect  on  the  clustering  energies 

Table  1  shows  the  effect  of  the  energy  parameters  on  Monte  Carlo  clustering 
energies  of  small  clusters  of  the  system  H30+(H,0)n  ,  special  attention  being  paid  to 
three-body  contribution.  The  two  series  of  parameters  (12-1)(6)  and  (12-1-6)  and  the  two 
geometries  of  the  ion  (planar  or  pyramidal)  described  above  are  considered  (ref.  11). 

Our  previous  values  (ref.  20)  were  obtained  with  a  (12-1)(6)  fitting  but  different 
C//6  parameters.  We  can  see  that  the  use  of  parameters  (12-1)(6)  or  (12-1-6)  in  the  pair 
approximation  involves  only  minor  changes  within  the  pair  approximation,  both 
geometries  giving  similar  qualitative  results.  But  three-body  forces  (column  noted  3B) 
strongly  affect  the  clustering  energies  of  small  clusters,  giving  values  in  much  better 
agreement  with  experimental  data.  The  result  for  n  =  2  remains  larger  than  the 
experimental  one.  It  would  be  interesting  to  check  if  tunneling  process  could  be 
responsible  for  this  effect.  As  seen  in  ref.  11,  three-body  effects  seem  much  less  im¬ 
portant  in  larger  systems. 

Three-body  effect  is  particularly  sensitive  on  this  system.  It  is  much  less 
pronounced  with  calcium  ionic  clusters  (ref.  16). 
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TABLE  1 

H30'(H20)n :  Monte  Carlo  clustering  energies  En  -  En_,  in  Kcal/mol.  T  =  300  K 


n 

Planar 

(1 2-1  )(6)  (12-1  )(6)  (12-1-6) 
3B 

Ref.(20) 

Pyramidal 
(12-1)  (6)  (12-1-6) 

Experimental 

Ref. (21)  Ref. (22) 

i 

-30.08 

-29.17 

-29.05 

-32.41 

-30.85 

-31.6 

-31.8 

2 

-28.72  -24.41 

-27.83 

-27.58 

-30.71 

-29.34 

-19.5 

-19.0 

3 

-27.26  -19.68 

-26.38 

-26.25 

-28.94 

-27.69 

-17.9 

-17.6 

Three-body  forces  effect  on  the  filling  of  the  first  solvation  shell 

We  previously  noted  (ref.  20)  that  the  most  stable  configurations  generated  in 
Monte  Carlo  calculations  based  on  the  pair  approximation  have  3  water  molecules  in 
the  first  solvation  shell  till  n  =  5  ,  as  could  be  expected  from  the  formation  of  hydrogen 
bonds  with  the  three  H  atoms  of  the  ion.  However,  from  n  =  6,  we  found  configurations 


i 
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of  similar  energies  with  either  3=  or  4  water  molecules  in  the, first  shell.  We  proposed 
(ref:  20)  that  this  fourth  water  molecule  can  be  explained  by  an  exchange  process 
through  one  of  the  three  hydrogen  bonds.  It  is  interesting  to.study  three-body  effect  on 
such  structures.  As  described  above,  and  with  more  details  in  ref.  19,  ab  initio  SCF 
calculations  have  been  performed  on  structures  (3+3)PL,  (4+2)PL,  (3+3)PYR, 
(4  +  2)PYR'of  the  system  H3Ot(H20)s.  Table  2  shows  the  three  and;  higher-body,  contri¬ 
bution  in  the  whole  system,  TB,  from  the  difference  between  SCF(SM)  and  SCF(bi).  We 
can  see  that  this  contribution,  rather  similar  for  both  geometries  of  the  ion,  is  clearly 
different  for  the  (3  +  3)  and  (4+2)  structures,  so  that  the  four  structures  have  close 
energies  in  a  pair  approximation  (SCF(bi))  while  the  (3  +  3)  geometries  are  more  stable 
than  the  (4  +  2)  ones  by  7-9  kcal/mole  in  the  supermolecule  treatment  (SCF(SM)).  Since 
three-body  contribution  occurs  mainly  at  SCF  level,  it  is  clear  that  the  neglect  of  this 
effect  favors  the  (4+2)  structures. 


TABLE  2 

H30f(H20)6 :  SCF  energies 


3  +  3 

PL 

4  +  2 

PL 

3  +  3 

PYR 

4  +  2 

PYR 

Total  system 

SCF(SM) 

-96.9 

-89.3 

-95.2 

-86.8 

SCF(bi) 

-105.6 

-106.6 

-105.1 

-103.8 

TB 

8.7 

17.3 

9.8 

17.0 

Separated  systems 

X  l-W  :  SCF(bi) 

-102.2 

-113.1 

-95.1 

-106.7 

X  l-W  :  SCF(bi)  +  TB 

-93.5 

-95.7 

-85.2 

-89.7 

£  W-W  :  SCF(bi) 

-3.3 

6.4 

-10.0 

2.9 

We  may  assume  that  this  many-body  contribution  is  mainly  due  to  ion-water- 
water  interaction,  and  decompose  the  total  energy  into  ion-water  molecules  (  £  l-W) 
and  water-water  molecules  (  £  W-W)  components.  At  SCF  level,  we  can  see  that  the 
ion-water  molecules  contribution  always  favors  the  (4  +  2)  structures,  even  when  three 
and  higher-body  terms  are  taken  into  account,  while  the  water-water  molecules 
component  favors  the  (3  +  3)  ones.  It  is  then  clear  that  a  correct  description  of  the  total 
system  requires  a  good  accuracy  on  both  components. 

Comparison  of  expressions  and  parameters  :  H,Q+(H,Q)fi  structures 

Many  expressions  and  parameters  have  been  proposed  in  the  literature  for 
the  water-water  energy.  At  SCF  level  (we  call  "SCF  level"  any  expression  which  does 
not  include  the  dispersion  term),  Table  3  presents  a  wide  variety  of  values.  However, 
in  all  cases,  the  most  stable  structure  is  (3  +  3)PYR,  followed  by  (3  +  3)PL,  the  other  two 
being  generally  repulsive,  except  in  ST2  calculations.  But  we  must  notice  that  the 
repulsive  energy  in  the  (4  +  2)PYR  structure  varies  from  2.9  to  13.7  kcal/mole.  If  we 
consider  the  comments  mentioned  above  about  the  required  accuracy,  this  situation  is 
rather  worrying. 
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TABLE  3 

H30+(H20)6 :  Water-Water  Molecules  Contribution,  SCF  level  (no  dispersion  energy)3 


I(W-W) 

3  +  3 

PL 

4  +  2 

PL 

3  +  3  • 

PYR 

4  +  2 

PYR 

SCF  (bi) 

-3.3 b 

6.4 

-10.0 

2.9 

SPC 

-1.6  6 

12.1 

-6.2 

13.7 

-4.2 e 

9.6 

-10.1 

10.2 

BF 

-3.7  6 

7.2 

-8.5 

7.4 

-4.0 3 

6.8 

-9.0 

6.8 

TIP3P 

-2.6  ^ 

11.3 

-7.6 

12.4 

TIPS2 

-4.0 63 

9.1 

-9.6 

9.5 

TIP4P 

-4.5 63 

7.6 

1 

—a 

O 

o 

7.5 

ST2 

-8.3 3 

1.1 

-16.0 

-4.6 

a)  See  the  text  for  the  notations 

b)  molecular  geometries  described  in  the  present  work 

c)  molecular  geometries  described  in  the  corresponding  reference 

TABLE  4 

H30+(H20)6 :  Water-Water  Molecules  Contribution,  Dispersion  Energy 3 


£(W-W) 

3  +  3 

PL 

4  +  2 

PL 

3  +  3 

PYR 

4  +  2 

PYR 

SPC 

-4.9 

-6.8 

-7.3 

-9.2 

BF 

-6.6 

-9.1 

-5.7 

-12.4 

TIP3P 

-4.7 

-6.5 

•6.9 

-8.8 

TIPS2 

-4.7 

-6.5 

-7.0 

-8.9 

TIP4P 

-4.8 

-6.6 

-7.1 

-9.0 

ST2 

-2.1 

-2.9 

-3.1 

-4.0 

RW  d 

4.6 3 

-e.8 

-6.9 

-9.1 

-5.7' 

-8.3 

-8.4 

-11.2 

-11.3  » 

-20.8 

-16.9 

-29.5 

d)  ref.  19  e)  TDCHF  :  C6  R-«  term 

f)  MBPT  :  C8  R*6  term 

g)  TDCHF  :  C6  R-6  to  C,0  R~K  terms 


The  dispersion  energies  presented  in  Table  4  correspond  to  the  leading  term 
CeR'6,  except  for  the  last  line  of  the  RW  development  (ref.  19)  which  includes  terms  till 
R-’°.  All  semi  empirical  determinations  are  generally  quite  close  together,  though  BF 
results  are  somewhat  larger  and  ST2  values  somewhat  smaller.  We  may  note  that  these 
values  are  very  similar  to  those  obtained  from  the  quantum  mechanical  development 
of  Rijks  and  Wormer  using  the  TDCHF  approximation,  while  the  MBPT  values  are  larger 
by  more  than  20%.  However,  when  the  higher  order  terms  are  considered  (last  line  of 
Table  5),  the  results  may  increase  by  a  factor  of  3.  These  last  values  are  overestimated, 
due  to  the  neglect  of  the  overlap  effect  in  this  region,  but  we  can  expect  that  the  true 
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results  should  be,  in  any  case,  larger  than  the  values  generally  proposed  on  the  basis 
of  the  leading  term  without  overlap,  correction  :  the  overestimation  due  to  the  charge 
overlap  effect  is  most  probably  smaller  than  the  higher  order  terms  contribution. 
Combined  with  the  remarks  about  the  SCF  level  contribution  (Table  3),  we  may  ques¬ 
tion  the  accuracy  of  such  calculations. 

For  the  shortest  water-water  intermolecular  distances,  we  noted  (ref.  19)  that 
the  C10R-'°  term  becomes  slightly  dominant  in  the  RW  expansion.  This  can  explain  that 
hydrogen  bonds  are  sometimes  described  by  a  (12-10)  expression  (refs.  23).  However, 
we  must  keep  in  mind  that  such  an  expression  may  be  eventually  used  only  for  very 
specific  intermolecular  (or  interatomic)  distances.  In  fact,  the  contribution  of  the  other 
terms  is  artificially  taken  into  acount  through  a  modified  C,0  coefficient.  The  values 
given  by  such  an  expression  for  other  intermolecular  distances  could  be  very  bad. 

The  analysis  of  Tables  3  and  4  shows  how  delicate  it  is  to  obtain  accurate  va¬ 
lues  of  the  energies.  From  the  total  energy  (Table  5),  it  is  extremely  difficult  to  decide 
which  expansion  is  the  best.  We  may  note  that,  in  all  cases,  the  most  stable  structure 
is  (3  +  3)PYR,  but  structures  (4  +  2)  have  either  attractive  or  repulsive  energies. 


TABLE  5 

H30*(HjO)6  :  Water-Water  Molecules  Energy3 


I(W-W) 

3  +  3 

PL 

4  +  2 

PL 

3  +  3 

PYR 

4  +  2 

PYR 

SCF  (bi)  +  disp(RW)* 

-14.7 

-14.4 

-26.9 

-26.6 

SCF  (bi)  +  disp(SPC) 

-8.3 

-0.4 

-17.2 

-6.4 

SPC 

-6.6 b 

5.3 

-13.5 

4.4 

-9.1 5 

2.8 

-17.3 

1.0 

MCY 

-10.6 

1.3 

-19.8 

-6.7 

CH 

-13.7 

-6.1 

-25.5 

-14.0 

AFHF 

-5.8 

6.8 

-12.7 

1.9 

BF 

-10.3 3 

-1.9 

-18.2 

-5.0 

-10.6' 

-2.3 

-18.7 

-5.6 

TIP3P 

-7.3 

4.8 

-14.5 

3.6 

TIPS2 

-8.8  *■« 

2.6 

-16.6 

0.7 

TIP4P 

-9.4  6C 

0.9 

-17.1 

-1.5 

ST2 

-10.4 c 

-1.8 

-19.1 

-8.6 

a-b-c)  :  see  footnote  on  Table  3;  d)  ref.  19 


A  similar  situation  is  found  for  the  few  expressions  which  describe  the  inter¬ 
action  between  the  ion  and  the  water  molecules.  This  is  shown  in  Table  6,  where  we 
can  also  note  the  importance  of  the  molecular  geometry.  In  particular,  the  use  of 
parameters  EK,  adjusted  for  a  planar  geometry  of  the  ion,  is  not  very  suitable  for  a 
pyramidal  geometry.  This  points  out  the  difficulty  to  search  the  best  geometry  with  a 
same  set  of  parameters. 

The  discrepancies  in  the  dispersion  energy  values  are  rather  surprising.  With 
such  differences,  we  may  question  the  real  significance  of  these  contributions  and  we 


would  rather  suggest  that  only  the  total  value  of  the  energy  should  be  considered  with 
GGB  parameters. 


TABLE  6 

H30a(H20)6  :  Ion-Water  Molecules  Contribution  * 


K'-w) 

3  +  3 

PL 

4  +  2 

PL 

3  +  3 

PYR 

4  +  2 

PYR 

SCF  level 

SCF(bi)+TB 

-93.5  * 

-95.7 

-85.2 

-89.7 

SCF  (bi) 

-102.2  * 

-113.1 

-95.1 

-106.7 

EK 

-109.4* 

-123.5 

-116.1 

-128.8 

GGB 

-70.1  * 

-74.2 

-66.7 

-69.7 

-78.5* 

-81.1 

-74.3 

-76.9 

Dispersion  energy 

EK 

-4.6* 

-4.6 

-4.5 

-4.8 

GGB 

-18.1  *•« 

-24.2 

-19.0 

-25.2 

Total  energy 

EK 

■O 

O 

nr 

V 

1 

-128.2 

-120.7 

-133.6 

GGB 

-88.2* 

-98.3 

-85.7 

-94.9 

-96.6  * 

-105.3 

-93.3 

-102.0 

BB 

-85.7  * 

-96.1 

-85.9 

-96.9 

SCF(bi)  +  disp(EK) 

-87.3* 

-106.9* 

-97.0 

-117.7 

-99.6 

'-11 1.5 

SCF(bi)  +  disp(EK)  +  TB 

-98.1  * 

-100.3 

-89.8 

-94.5 

a-b-c)  :  see  footnotes  on  Table  3 


From  all  these  considerations,  it  is  not  surprising  that  no  definitive  answer 
can  be  given,  at  this  stage  of  the  work,  about  the  relative  stabilities  of  the  4  structures 
studied.  Indeed,  they  are  rather  close  in  energy  and  still  more  accurate  calculations 
should  be  necessary  to  remove  the  ambiguity.  It  is  clear  that  the  pair  approximation 
favors  the  (4  +  2)  structures  (Tables  2,  7);  however,  the  dispersion  energy  in  water- 
water  systems  counterbalances  three-body  effect  since  it  favors  the  (4  +  2)  configu¬ 
rations  by  about  2  kcal/mole  when  the  only  leading  CGR'C  is  considered,  and  by  9-13 
kcal/mole  for  an  expansion  till  the  R~'°  terms  in  the  RW  treatment  (Table  4).  In  the  last 
case,  the  total  energies  are  (Table  7,  SCF(bi)  +  displW(EK)  +  dispWW(RW))  -121.5, 
-132.1,  -126.5,  -138.2  kcal/mole,  respectively,  if  the  three  and  higher-body  contribution 
is  neglected,  and  (Table  7,  SCF(SM)  +  displW(EK)  +  dispWW(RW))  -112.8,  -114.7,  -116.6, 
-121.1  kcal/mole  when  it  is  taken  into  account  for  the  4  configurations  described  above, 
the  dispersion  energy  between  the  ion  and  the  water  molecules  being  similar  for  all 
these  configurations  from  EK  parameters.  As  commented  above,  it  is  difficult  to  state 
which  is  the  most  stable  structure  since  the  dispersion  energy  treatment  is  not 
corrected  for  the  charge  overlap  effect  in  this  region. 
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These  results  give  a  good  idea  of  the  limit  of  the  validity  of  the  conclusions 
which  can  be  extracted  from  calculations  using  such  approximations,  as  soon  as 
unusual  structures  are  considered. 

The  calculations  have  been  performed  on  a  IBM  3090  200VF  machine,  at  the 
Centre  de  Calcul  du  CNRS  de  Strasbourg-Cronenbourg. 

TABLE  7 

H30i(H20)6 :  Total  energy a 


3+3 

PL 

4  +  2 

PL 

3  +  3 

PYR 

4  +  2 

PYR 

MCY+EK 

-124.7  4 

-126.9 

-140.5 

-140.3 

BB+AFHF 

-91.5 6 

-89.3 

-98.6 

-94.9 

GGB+SPC 

-94.8  6 

-93.1 

-99.2 

-90.4 

-105.7' 

-102.5 

-110.7 

-101.0 

SCF(  bi  )  +  displW(EK)  +  dispWW(SPC) 

-115.5 6 

-118.1 

-116.9 

-117.9 

SCF(SM)  +  displW(EK)  +  dispWW(SPC) 

-106.4 b 

-100.7 

-107.0 

-100.9 

SCF(  bi  )  +  displW(EK)  +  dispWW(RW) d 

-121.5 6 

-132.1 

-126.5 

-138.2 

SCF(SM)  +  displW(EK)  +  dispWW(RW) 

-112.8  b 

-114.7 

-116.6 

-121.1 

a-b-c)  :  see  footnotes  on  Table  3;  d)  ref.  19 
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DISCUSSION 


Question  -  Is  three-body  contribution  always  repulsive  ? 

KOCHANSKI  -  Only  the  Self  Consistent  Field  level  three  and  higher-body  contribution 
is  considered  here  because  it  is  much  more  important  than  the  three-body  dispersion 
energy.  It  is  repulsive  for  these  four  geometrical  configurations  because  the  dominant 
water  1 -ion-water  2  interactions  are  repulsive,  water  1  and  water  2  being  two  nearest 
water  molecules  of  the  first  solvation  shell.  However,  other  terms  may  be  attractive,  in 
particular  when  one  of  the  water  molecules  is  in  the  second  shell.  They  are  generally 
smaller  in  magnitude.  This  means  that  an  analytical  description  of  the  three-body 
contribution  requires  some  repulsive  AND  attractive  components. 
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SUMMARY 

A  new  formalism  has  been  developed  in  order  to  evaluate  the  intermolecular  interaction 
energy  Eint  between  an  organometallic  substrate  S  and  an  incoming  reactant  R  in  the 
framework  of  the  extended  Huckel  method.  Approximate  procedures  are  used  to  estimate 
electrostatic,  charge  transfer  and  exchange  repulsion  components  of  Ejnt,  the  values  of 
which  are  mapped  as  a  local  reactivity  index  onto  the  molecular  surface  of  S  by  means  of  a 
standard  color  code.  When  applied  to  electrophilic  and  nucleophilic  addition  reactions  such 
as  the  protonation  of  ferrocene  or  the  attack  of  arene-Cr(CO)3  by  H",  this  combination  of 
quantum  chemistry  and  molecular  graphics  techniques  describes  the  high  regioselectivity 
of  these  reaction  mechanisms. 

INTRODUCTION 

Molecular  graphics  (MG),  which  can  be  defined  as  the  application  of  computer 
graphics  techniques  to  investigate  molecular  structure,  function  and  interaction,  is  used 
today  on  a  routine  basis  to  build,  represent  and  manipulate  three-dimensional  (3D)  models 
of  molecular  structures  and  properties  (ref.  1).  As  a  rapid  access  to  molecular  geometry  is 
a  prerequisite,  efficient  procedures  for  structure  and  substructure  searching  within  crystal¬ 
lographic  data  bases  have  been  proposed  (ref.  2).  When  no  structural  data  are  available, 
several  powerful  model  builders  based  on  molecular  mechanics  can  be  used  in  standard 
molecular  modeling  packages  (ref.  3). 

The  calculation  of  molecular  properties  such  as  intermolecular  interaction  energies, 
however,  usually  requires  time-consuming  quantum  chemical  methods  and  faster  alterna¬ 
tives  must  be  derived  for  MG  applications.  For  example,  the  well-known  molecular  electro¬ 
static  potential  (MEP)  (ref.  4)  may  now  be  approximated  by  using  multipolar  expansions  of 
the  electron  density  (ref.  5).  This  considerably  reduces  the  computational  effort  required  to 
map  these  MEP  values  onto  molecular  surfaces  (ref.  6).  Because  of  this,  the  MEP  model 
has  met  with  considerable  success  in  modeling  the  complex  situations  arising  from  organic 
reaction  mechanisms  (ref.  7)  or  drug  design  applications  (ref.  8).  However,  we  have 
recently  found  (ref.  9)  that  the  electrostatic  component  alone  is  not  sufficient,  even 
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qualitatively,  for  a  proper  description  of  reaction  mechanisms  involving  an  organometallic 
substrate  S  and  an  incoming  reactant  R  with  electrophilic  or  nucleophilic  character.  In  such 
cases,  there  is  generally  an  important  charge  transfer  between  the  d  shell  of  the  transition 
metal  atom  of  S  and  the  orbitals  of  R  and  it  is  essential  to  take  into  account  the  corres¬ 
ponding  energy  which  can  be  of  the  same  order  of  magnitude  as  the.  electrostatic 
component.  As  the  charge  transfer  mechanism  is  essentially  due  to  orbital  overlap,  it  is 
difficult  to  parametrize  this  energy  component  using  an  empirical  potential.  Instead,  a 
quantum  chemical  calculation  of  the  S-R  supermolecule  should  be  performed  -  which  is 
considerably  more  demanding  in- terms  of  computer  time,  especially  as  this  calculation 
should  be  repeated  for  all  the  possible  locations  of  R  on  the  molecular  surface  of  S.  Clearly, 
in  view  of  the  short  response  times  required  by  MG  applications,  it  was  important  to  employ 
an  efficient  semi-empirical  quantum  chemical  method  to  evaluate  the  charge  transfer 
component.  This  is  why  we  turned  to  the  extended  Huckel  (EH)  method  (ref.  10)  which  is 
known  to  quickly  and  reasonably  predict  the  electronic  structure  of  transition  metal 
complexes. 

In  this  paper,  after  a  brief  description  of  the  model  we  have  developed,  some  recent 
results  obtained  for  both  electrophilic  and  nucleophilic  additions  to  organometallic  species 
will  be  reported  and  discussed. 

METHOD 

Within  our  model,  the  S-R  interaction  energy  Ejnt(  r )  is  expressed  as  a  sum  of  several 
components: 

Eint(r)  =  Ees(r)  +  Ect(r)  +  Eex(r)  (1) 

where  r  specifies  the  position  of  the  incoming  electrophile  or  nucleophile  reactant  in  the 
vicinity  of  the  organometallic  substrate;  Ees,  Ect  and  Eex  being  electrostatic,  charge  transfer 
and  exchange  energy  components,  respectively. 

In  order  to  display  molecular  surfaces  color-coded  according  to  E,nt  values  used  as  a 
reactivity  index,  Eint  is  evaluated  repeatedly  at  selected  points  r  located  on  the  molecular 
surface  of  substrate  S.  The  number  of  points  depends,  of  course,  on  the  size  and 
complexity  of  the  substrate,  and  in  the  cases  presented  here  this  number  varies  between 
4’000  and  6’000.  Negative  (respectively  positive)  values  of  Ejnt  correspond  to  S-R  attractive 
(repulsive)  interactions,  and  the  regions  where  Ejnt  is  at  a  minimum  are  the  most  reactive 
sites  of  S  to  be  attacked  by  R.  In  all  cases,  the  color-coding  range  from  red  to  yellow  to 
blue  extends  smoothly  over  the  numerical  range  of  E,nt  from  the  most  negative  to  zero  to 
the  most  positive  values,  which  means  that  the  red  zones  correspond  to  preferred  sites  of 
attack.  In  this  paper,  however,  the  need  for  monochrome  figures  has  lead  us  to  suppress 
the  shading  of  the  surfaces  and  to  map  the  red-yellow-biue  color  scale  over  the  black-grey- 
white  range. 

It  is  important  to  choose  a  simple  though  realistic  model  for  the  reactant  R,  as  the  com¬ 
puter  time  required  to  evaluate  E(nt  increases  rapidly  as  a  function  of  the  complexity  of  R. 


- 
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;  In  order  to  have  Ejnt  values  depending  only  on  the  position  of  R  on  the  molecular  surface 
ofS,  and  not  on  its  orientation,  two  spherically  symmetric  model  reactants  have  been 
r  chosen:  a  proton  with  a  virtual  Is  orbital  for  the  electrophile  and  an  H~  hydride  ion  with  two 
|  Is  electrons  for  the  nucleophile.  Let  us  turn  now  to  the  approximations  we  haw  used  for 
| '  the  various  components  of  Ejnt. 

\ 

* 

|  Electrostatic  component 

I  *  In  the  case  of  an  electrophilic  attack,  Ees  is  equal  to  the  MEP  of  substrate  S: 

t  Ees(^  =  E  za/I*  -  *aI  "EE  v  <  P 1 1/r 1 1/  >  (2) 


where  the  first  term  corresponds  to  nuclear  repulsion,  the  summation  running  over  all 
atoms  A  of  S,  with  nuclear  charge  zA  located  in  rA.  The  second  term  originates  from 
electronic  attraction,  Pu„  being  the  first  order  density  matrix  element  corresponding  to 
atomic  orbitals  (AOs)  x  and  x„,  and  <  /x  |  l/r  |  v  >  being  defined  as 


<  /x |  l/r  |  v  >  =  x„(r')  — - —  xv{*')  dr' 

|r-r'| 


In  case  of  nucleophilic  attack,  we  may  reasonably  assume  that  the  electrostatic 
interaction  between  S  and  the  H~  ion  reduces  to  that  between  S  and  a  negative  point 
charge -which  is  obviously  correct  for  rather  large  S-R  distances  where  the  so-called 
penetration  integrals  vanish. 

Several  test  calculations  have  shown  this  to  be  the  case  for  S-R  distances  roughly  1  A 
larger  than  the  usual  van  der  Waals  radii  of  the  atoms  of  S,  which  explains  why  for  nucleo¬ 
philic  attack  we  are  using  molecular  surfaces  of  the  substrates  generated  from  larger 
atomic  spheres  (typically  by  1  A)  than  the  standard  van  der  Waals  ones.  On  such  surfaces, 
the  electrostatic  component  for  nucleophilic  attack  is  therefore  given  by  -Ees  (eqn.  (2)). 

As  the  calculation  of  <  /x  |  l/r  |  v  >  integrals  using  the  EH  basis  of  atomic  orbitals  is 
time-consuming  since  they  are  of  Slater  type,  we  used  the  neglect  of  diatomic  differential 
overlap  (NDDO)  approximation,  according  to  which  the  second  righthand  term  of  eqn.  (2) 
becomes: 

E  E  V  <  pU/rk  >  =  E  E  E  v  <  pa  I1/17 1  "a  >  (4) 

/xi/  A  /xeA  i/f  A  A  A 

The  first  summation  is  over  all  atoms  A  of  S.  However,  this  requires  the  evaluation  of 
the  reduced  density  matrix  using  orthogonalized  AOs. 


It  has  recently  been  pointed  out  by  Brown  et  al.  (ref.  11)  that  organometailic  substrates 
are  often  characterized  by  bands  of  closely  spaced  energy  levels  both  in  HOMO  and 
LUMO  regions.  This  renders  the  use  of  frontier  orbitals  only,  in  the  evaluation  of  orbital  or 
charge  transfer  effects,  as  questionable.  Instead,  Brown  et  al.  have  suggested  replacing 
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the  perturbational  treatment  by  a  complete  EH  calculation  Of  the  S-R  supermolecule;  the 
so-called  "orbital  interaction  energy"  -is  obtained  as  the  difference  -  between  the  total 
energies  of  the  supermolecule  and  those  of  the  separate  fragments.  We  have  used  the 
same  approach  in  our  model; 

As  EH  total  energies  fairly  comprehensively  represent  the  sum  of  the  covalent  energies 
within  the  chemical  bonds,  the  S-R  charge  transfer  energy  may  be  approximated  as: 

Ect  =  Etot(S-R)  -  Etot<S)  -  Etot(R)  (5) 

where  Etot(X)  represents  the  EH  total  energy  of  system  X  calculated  as 

Etot(X)  =  £  ni  (6) 

i 

with  n«  and being,  occupation  number  and  energy,  respectively,  of  the  ith  MO  of  X. 

Ect(  r )  is  then  readily  obtained  by  positioning  the  reactant  R  at  selected  points  r  on  the 
molecular  surface  of  S  and  by  using  expression  (5). 


In  the  case  of  electrophilic  attack  there  is  no  exchange  component  in  our  model,  as  the  i 
reactant  has  no  electrons;  however,  the  situation  is  different  for  nucleophilic  attack.  For  this  j 
case,  the  exchange  term,  which  describes  the  short-range  repulsion  due  to  the  overlap  of  j 


both  S  and  R  electron  distributions,  is  simply  chosen  to  be  zero  outside  the  molecular 
surface  of  S  and  infinite  on  the  surface  itself  (hard  sphere  approximation),  which  means 


that  the  minima  of  Ees  +  Ect  on  this  surface  are  automatically  taken  as  the  most  reactive 
sites.  This  approximation  is  actually  justified  by  the  1/r12  behavior  of  the  short-range 


repulsion  component,  which  leads  to  a  very  steep  function  close  to  the  nuclei. 


To  summarize,  our  model  therefore  rests  on  the  following  assumptions:  polarization  i 
and  dispersion  energy  components  are  negleted;  the  geometrical  deformations  of  S  when 
attacked  by  R  are  not  taken  into  account;  solvent  effects  are  ignored  so  far.  The  model  can  j 
thus  be  used  to  describe  the  initial  stage  of  attack  of  mainly  kinetically  controlled  processes 
exhibiting  an  early  transition  state. 


Computational  details 

All  the  EH  calculations  have  been  performed  using  the  single  zeta  Slater  type  atomic 
orbitals  of  Clementi  and  Roetti  (ref.  12),  except  for  the  d  shell  of  transition  metal  atoms 
which  has  been  described  by  the  double  zeta  functions  of  the  same  authors.  For  all  hydro¬ 
gen  atoms  and  for  the  reactant,  a  Is  exponent  of  1.0  has  been  used  as  it  results  in  slightly 


better  intermolecular  interaction  energies.  For  the  evaluation  of  Ees,  self-consistent  charge 
and  configuration  (SCCC)  calculations  have  been  performed  with  a  quadratic  dependence 


of  the  valence  state  ionization  energies  (VSIE)  of  all  the  atoms.  The  calculation  of  Ect, 
however,  has  been  carried  out  without  the  SCCC  procedure,  with  a  VSIE  of  reactant  HRR 
systematically  chosen  at  rH0M0  +  0-2  eV  for  electrophilic  attack  and  eLUM0 '  °-5  eV for 
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the.  nucleophilic  attack,  eH0M0  anc*  £lumo  ^e'n9  the'  HOMO  and  LUMO  energies  of  the 
substrate,  respectively.  As  HRR  has  to  be  chosen  <H0M0  <  HRR  <  eLUM0,  small  deviations 
from  the  values  optimized  for  soft  reactants  allow  to  take  into  partial  account,  when 
necessary,  the  very  nature  of  R:  hard  electrophiles  are  characterized  by  larger  (i.e.  less 
negative)  HRR  values  and  hard  nucleophiles  by  smaller  (i.e.  more  negative)  HRR  values. 

RESULTS  AND  DISCUSSION 
Electrophilic  addition  to  ferrocene 

According  to  NMR  studies  (ref.  13),  ferrocene  protonates  readily  in  strong  acids  to  give 
the  reaction  intermediate  Fe(C5H5)2H+.  The  protonation  site  is  located  on  the  metal, 
presumably  in  the  equatorial  plane.  Ion  thermochemistry  experiments  suggest  that  this  is 
probably  also  the  case  in  the  gas  phase,  though  ring  protonation  in  an  exo  position  is  also 
possible  with  a  proton  affinity  at  least  5  kcal/mole  smaller  than  on  metal  (ref.  14).  A 
theoretical  modeling  of  this  reaction  should,  when  solvent  effects  are  not  taken  into 
consideration,  predict  the  correct  site  of  addition.  The  results  obtained  with  our  model  are 
presented  in  Fig.  1.  It  is  immediately  seen  that  the  five  lowest  minima  of  Eint,  reflecting  the 


-s5-5i 

-3s;j 


Fig.  1.  Solid  model  of  the  molecular  surface  of  ferrocene  shaded  according  to  the  Ejnt 
property  for  electrophilic  attack.  The  shading  range  from  black  to  grey  to  white 
extends  smoothly  over  the  numerical  range  of  Ejnt  from  the  most  negative  to  zero  to 
the  most  positive  values,  which  means  that  dark  zones  correspond  to  preferred 
sites  of  attack.  The  shading  scale  on  the  right  indicates  the  Ejnt  values  mapping,  in 
kcal/mol. 

five-fold  symmetry  axis  of  the  molecule,  are  located  in  equatorial  positions  of  the  molecular 
surface  of  Fe(C5H5)2,  corresponding  therefore  to  protonation  on  metal.  Furthermore,  a 
detailed  examination  of  Fig.  1  reveals  that  secondary  minima,  corresponding  to  less 
favorable  interaction  energies,  are  found  on  exo  positions  of  the  ligand  rings,  which 
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parallels  the  gas  phase  measurements.  Interestingly,  it  is  only  by  adding  Ect  to  Ees  that  the 
theoretical  model  correctly  predicts  the  most  stable  protonation  site.  Indeed,  the  use  of 
electrostatic  effects  alone  as  Eint  leads  to  the  ligand  site  predicted  as  the  most  favorable. 
As  a  similar  conclusion-  has  been  drawn  from  several  other  examples  we  have  been 
studying,  the  MEP  model  should  be  regarded  with  caution  when  used  for  predicting 
organometallic  reaction  mechanisms.  This  may  be  ascribed  to  the  important  charge 
transfer  effects  between  substrate  and  reactant  which  are  due  to  the  d  orbitals  of  the  metal 
atom  belonging  to  S. 

Sequential  nucleophilic  and  electrophilic  addition  to  arene-CrfCOt0 

The  structure  and  reactivity  of  arene-chromium  tricarbonyl,  (»?6-C6H6)Cr(CO)3,  have 
been  the  subject  of  intensive  investigations  in  organometallic  chemistry  (ref.  15).  In  this 
complex,  as  in  most  organometallic  species  containing  unsaturated  hydrocarbon  ligands, 
metal-arene  bonding  leads  to  a  net  charge  transfer  from  the  ring  to  the  metal,  with  the 
result  that  nucleophilic  attack  occurs  readily  on  the  exo-face  of  the  positively  charged 
hydrocarbon.  One  of  us  has  recently  shown  that  the  (i)6-C6H6)Cr(CO)3  complex  may 
undergo  with  high  regioselectivity  the  following  sequential  reactions:  (i)  a  nucleophilic  addi¬ 
tion  by  a  reactive  carbanion  FT  to  the  exo-face  of  the  benzene  ring;  (ii)  an  electrophilic 
addition  to  the  intermediate  anionic  complex  (i?5-C6H6R)Cr(CO)3"  directly  to  the  metal 
atom;  (iii)  CO  migratory  insertion  in  the  metal-electrophile  bond,  and  (iv)  reductive  elimina¬ 
tion  to  yield,  after  decomplexation,  a  1,2-trans-disubstituted  cyclohexadiene  (ref.  16). 

We  have  calculated  the  interaction  energies  between  substrate  and  reactant  for  the  first 
two  steps  of  this  sequential  mechanism  (ref.  17).  Figure  2  shows  that  the  initial  nucleophilic 
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attack  is  likely  to  occur  on  the  face  opposite  to  the  metal  of  a  complexed  arene,  though  the 
region  with  negative  Ejnt  values  extends  broadly  over  the  upper  part  of  the  surface  towards 
the  metal  atom. 

The  subsequent  electrophilic  attack  has  been  modeled  using  an  anionic 
cyclohexadienyl-Cr(CO)3  adduct  with  R  =  1,3-dithian  (ref.  18).  The  results,  presented  in 
Fig.  3,  indicate  that  the  electrophilic  attack  should  take  place  on  the  metal,  which  is 


Fig.  3.  Solid  model  cf  the  molecular  surface  of  the  >}5-[6-(1,3-dithian-2-yl)cyclohe- 
xadienyl]tricarbonylchromium{0)  anion  shaded  according  to  the  E,nt  value  for  elec¬ 
trophilic  attack.  The  arrow  indicates  the  position  of  the  minimum  of  Ejnt. 


in  total  agreement  with  a  crystallographic  study  performed  recently  (ref.  18)  on  the 
(ij5-C6H6R)Cr(CO)3R’  complex,  with  R  =  2-methyl-1,3  dithian  and  R’  =  (Ph)3Sn.  In  the 
crystal  structure  the  metal-Sn(Ph)3  bond  is  trans  to  the  cyclohexadienyl  sp3  carbon. 
Clearly,  the  present  model,  based  on  a  combination  of  both  extended  Huckel  and 
molecular  graphics  techniques,  appears  to  be  able  to  rationalize  the  important  changes  in 
reactivity  which  occur  in  organic  fragments  through  metal  complexation  and, 
simultaneously,  the  high  regioselectivity  observed  in  sequential  additions  to  such 
organometallic  species.  This,  in  turn,  suggests  that  it  can  be  used  as  a  reliable  tool  in 
modeling  the  reactivity  of  such  compounds. 
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DISCUSSION 


PERAHIA  -  Avez-vous  compare  vos  resultats  avec  ceux  obtenus  en  utilisant  des 
methodes  ab  initio  ? 

MORGANTINI  -  Oui,  nous  avons  calcule  pour  plusieurs  petites  molecules  (H20,  NH3 
CH3F...)  I'energie  d'interaction  pour  des  attaques  electrophiles  ou  nucleophiles  a 
I’aide  de  Gaussian  80  et  de  la  decomposition  d’energie  de  Morokuma.  Nous  avons 
egalement  fait  des  calculs  similaires  pour  le  ferrocene. 
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SUMMARY. 

Following  explicit  formulae  for  the  calculation  of  second-order  exchange  contributions 
(induction  as  well  as  dispersion)  within  the  framework  of  Symmctry-Adaptcd- 
Pcrturbation-Thcorics  (SAPT),  exchange  contributions  can  be  expressed  as  a  combi¬ 
nation  of  electrostatic  interaction  energies  between  suitably  generalized  charge 
distributions  (overlap  intcnnolecular  distributions).  Numerical  calculations  for  the 
interaction  of  two  water  molecules  arc  presented.  The  possibility  of  defining  simple 
analytical  functions  representing  each  contribution  of  the  interaction  energy  is  discussed. 


INTRODUCTION 

In  the  field  of  theoretical  evaluations  of  interaction  energies,  two  types  of  approach 
arc  generally  distinguished,  'flic  first  approach  is  the  so-called  supcrmolcculc  method 
(ref.  1)  in  which  the  interaction  energy  is  obtained  as  a  difference  between  the  total 
energy  of  the  interacting  molecules  (the  supcrmolcculc)  and  the  sum  of  the  total  energies 
of  each  monomer,  all  energies  being  calculated  by  using  the  same  method.  But  he  dis¬ 
persion  contribution  cannot  be  obtained  at  the  SCF  level,  in  other  words,  an  extensive 
Cl  calculation  would  be  necessary  to  recover  this  important  contribution.  In  the  second 
approach,  the  intermolccular  interaction  energy  is  calculated  from  perturbation  theory 
using  the  intermolccular  potential  as  perturbing  operator.  When  the  intermolccular  dis¬ 
tance  R  is  large,  one  is  dealing  with  the  Rayleigh-Schrodingcr  perturbation  theory  in 
which  only  simple  products  of  monomer  wavcfunctions  are  used.  Due  to  the  large 
separation  between  monomers  no  anlisymmctrizaiion  of  the  factorized  wave  functions 
is  necessary.  For  shorter  distances,  e.g.  distances  corresponding  to  the  region  around  the 
equilibrium  configuration,  the  usual  Rayleigh-Schrodingcr  perturbation  theory  must  be 
abandoned  (ref.  2)  and  in  order  to  take  into  account,  at  least  to  some  extent,  the 
exchange  of  electrons  between  the  interacting  molecules,  some  form  of  exchange  per- 
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turbation  theory  (the  so-called  Symmetry  Adapted  Perturbation  Theories  (SAPT),  ( refs. 
3-4)  must  be  used.  It  is  important  to  emphasize  that  this  approach  is  particularly  at¬ 
tractive  with  regard  to  the  usual  supcrmolecular  approach  since  the  interaction  energy 
is  decomposed  into  a  sum  of  terms  for  each  of  which  it  is  possible  to  give  some  physical 
interpretation  (at  least  for  terms  up  to  and  including  second-order  terms).  This  is  a  very 
appealing  feature  for  a  qualitative  understanding  of  the  interaction  and  can  be  very 
helpful  for  the  development  of  simplified  formulas  for  intcrmolccular  interactions. 

To  our  knowledge,  the  first  example  of  an  exchange-perturbation  theory  calculation 
is  due  to  Jeziorski  and  van  I  lemert  in  their  pioneering  work  on  the  water  dimer  (ref.  5). 
Neglecting  all  intramonomer  correlation  effects,  they  evaluated  the  complete  first-order 
interaction  energy  /f10  =  1:^  +  (explicitly,  the  sum  of  the  Rayleigh-Schrodingcr  and 
first-order  exchange  energies)  and  only  the  Raylcigh-Schrodingcr  second-order  interac¬ 
tion  energy. 

Very  recently  I  less  cl  al.  (ref.  6)  have  presented  a  new  method  of  deriving  explicit 
formulas  for  the  calculation  of  second-order  exchange  contributions  (induction  as  well 
as  dispersion)  within  the  framework  of  Symmetry-Adapted- Perturbation-Theory. 
Numerical  results  for  the  interaction  of  two  water  molecules  have  been  presented, 
putting  into  evidence  the  nonncgligiblc  role  of  the  complete  second-order  exchange 
contributions.  But  it  is  weii  known  that  the  quality  of  the  results  strongly  depends  on  the 
size  of  the  basis  set  used  in  the  calculations,  thus  such  a  method  cannot  be  applied  to 
arbitrarily  large  systems.  In  fact  the  ability  to  determine  with  a  high  accuracy  the  values 
of  each  component  of  intcrmolccular  interaction  energy  opens  a  way  towards 
representing  them  through  simple  analytical  functions  fitted  on  values  calculated  in  the 
framework  of  this  perturbation  treatment.  In  this  present  work  we  have  been  interested 
by  the  development  of  simplified  formulae  for  the  calculation  of  the  dispersion  and 
exchange-dispersion  energies. 

The  organization  of  the  present  paper  is  as  follows.  In  See.  I  and  II  we  summarize 
the  formal  development  of  the  second-order  exchange  contributions  derived  by  1  less  cl 
al.  (ref.  6)  together  with  the  most  important  results  obtained  by  these  authors  for  the 
water  dimer.  See.  Ill  is  devoted  to  the  investigation  of  basis  set  eficcts  upon  the 
different  interaction  energy  components  calculated  with  the  method  hcrc-above  cited.  In 
section  IV  we  will  present  and  discuss  some  simplified  formulae  elaborated  for  dispersion 
and  exchange  dispersion  contributions. 


I.  MHTIIOD 

We  will  just  summarize  the  formal  development  of  second-order  exchange  contribu¬ 
tions  presented  by  1  less  et  al.  (ref.  6) 

hollowing  standard  Symmetry-Adapted  Perturbation  Theories  (refs.  3,4)  the  complete 
first-  and  second-order  interaction  energies  arc  written  as: 


<  '1'q  M'p  1  VAR  A  |  4'q  H'p  > 
<'!'0''i'oiA|T''<> 


:  Tq  Yq  |  VAB  R0  A  {VAB  -  I0*)  1  'I'p  'l’"  > 
<  <  ‘I'o  |  A  |  Tq  M'o  >  ~ ~ 


where  denotes  the  reduced  resolvent  of  //0  given  by 


aasssiKZgSESESns: 


0  v  (tf  + t 


'vf  yYj  >  <  'Vf  'Vf  | 

+  £")  -  (f*  +  /$ 


(the  prime  in  2'  means  as  usual  that  the  term  corresponding  to  i  =  0  and  j  =  0  is  excluded 
from  the  summation)  and  A  is  the  intersystem  antisymmetrizer  which  is  written  in  the 
form  (ref.  7). 

A  =  1  -  A'  =  1  -  /’(,)  +  l\ 2)  +(-l)AW  ,  (4) 

where  Pm  —  Y.b  denotes  the  sum  of  all  permutations  exchanging  (space  and  spin) 

i  i 

coordinates  of  electron  i  of  molecule  A  with  coordinates  of  electron  j  of  molecule  B,  and 
similar  definitions  hold  for  -  'V  -  (A',,/ denotes  the  smallest  value  of  and  Ng,  the 
numbers  of  electrons  of  molecule  A  and  B  respectively). 

The  second-order  perturbation  energy  /i®  (I;q.  (2))  may  be  decomposed  into  the  usual 
second-order  Raylcigh-Schrodingcr  (RS)  perturbation  energy  (obtained  by  putting 
A  =  1  in  Ijq.  (1))  and  into  the  so-called  second-order  exchange  energy  /£?(A 

,.<2)  _  .-< 2)  _  ,-<2) 

•■rich  =  ‘-IIS 


<  M'o  'i'o  I  ( >//IB  ~  ^°)  (A'  -  <  A'  >  )  |  <D(I)  > 


where  <  A'  >  and  <  A  >  arc  the  expectation  values  of  A'  and  A  calculated  with  the 
ground-state  wavcfunction  M'o  'I'o  and  <I>"’  stands  for  the  first-order  correction  to  the 
wavcfunction  in  the  perturbation  theory  (ref.  4). 

<I»(,)  =  — A*„  r'"To  T"  .  (6) 

Now,  since  multiple  exchanges  arc  supposed  to  contribute  weakly  in  the  region  around 
the  equilibrium  geometry  (ref.  8-9)  only  the  leading  contribution  to  corresponding 
to  a  single  exchange  of  electrons  between  molecules  A  and  1)  has  been  calculated.  Thus, 
putting  A'  =  in  Hq.  (5)  and  neglecting  terms  which  will  correspond  to  contributions 
of  order  higher  than  .S' 2  (where  S  stands  for  overlap  integrals  between  orbitals  of 
monomers  A  and  B)  within  the  llartrcc-Fock  formalism  used  below,  is  obtained 

=  “<  'I'o  'I'o  I  ( V*B  ~  <  I''"*  >  )(/’(.>  “  <  Pm  > )  I  ‘I>0)  >  .  (?) 
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Rewriting  <I>(I>  (Eq.  (6))  as  follows 

<i>(,)=<<rf+<X+<iC  >  («) 

and  inserting  the  previous  decomposition  of  <I><0  into  Eq.  (7),  the  second-order  exchange 
energy  decomposes  into  three  terms 

e,  =  l^ch.,„d  (A  ->  II)  +  l%ch_ind  (/i  ->  A)  +  hfjch_d,sp  .  (9) 

The  sum  of  the  first  two  terms  in  Eq.  (9)  wiil  be  referred  to  in  the  following  as  the 
exchange  induction  energy,  while  will  be  referred  to  as  the  exchange  dispersion 

energy. 

In  fact  following  Clavcric  (ref.  7)  the  method  adopted  by  Hess  et  al.  (ref.  6)  is 
essentially  to  express  exchange  contributions  as  a  combination  of  formal  electrostatic 
interaction  energies  between  suitably  generalized  charge  distributions  (so  called  overlap 
intermolecular  charge  distributions).  To  do  that,  two  basic  ingredients  have  been  used, 
namely: 

1)  The  possibility  of  reducing  the  action  of  intersystem  antisymmetrizer  (appearing  in 
SAP'f)  on  factorized  SCE  wave  functions  to  a  sum  of  simple  products  of  SCF 
determinants  pertaining  to  each  subsystem,  namely 

/’(,)  C'l"' £  X  T'(bjS]  W?)  -  (10) 

tea  jcii  Vv  W 

where  the  summation  is  over  the  spin-orbitals  of  determinants  'I''*  (here  labeled  by  i)  and 
'l'"  (labeled  by  j).  Using  Eq.  (10)  all  integrals  involving  functions  of  the  type 
Pm  ['!"*  'I'"]  arc  reduced  to  sums  of  integrals  involving  simple  products  'P*(^ 

of  "opposite  transfer"  determinants. 

2)  'fhc  next  step  consists  of  the  use  of  the  so-called  Longuct-IIiggins  representation 
of  the  interaction  operator  VAB  in  terms  of  the  molecular  charge  distributions  pM 
(M  =  A,B)  (ref.  10),  namely: 


(ID 


(12) 


The  use  of  these  two  ingredients  allows  this  development  to  have  a  systematic 
character.  Within  the  SCE  approximation,  the  different  components  of  interaction 
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energy  are  now  written  as  appropriate  combinations  of  (mono-  biclcctronic  and  overlap) 
integrals  involving  spin-orbitals  of  closed-shell  isolated  systems  (for  more  details  see  ref. 
6). 

Now,  when  performing  the  practical  evaluation  of  the  quantities  necessary  to  the 
above  calculation,  one  is  faced  to  the  well-known  problem  of  summing  expressions 
defined  over  the  infinite  set  of  unoccupied  orbitals  of  the  Fock  operator  belonging  to  the 
continuous  spectrum.  As  pointed  out  by  Jc/iorski  and  van  Ilcmcrt  (ref.  5),  such 
summations  arc  practically  incxccutable  integration.  To  overcome  this  difficulty,  it  has 
been  used  the  variational-perturbation  method  proposed  by  these  authors.  This  method 
which  is  essentially  based  on  the  minimization  of  a  Ilyllcraas-typc  functional,  has  been 
already  described  in  detail  (see  c.g.  refs.  5, 11) 


II.  NUMERICAL  RESULTS  AND  DISCUSSION 

All  calculations  have  been  done  for  a  fixed  relative  orientation  of  the  two  interacting 
water  molecules  and  by  varying  only  the  distance  Roo  between  the  two  oxygen  atoms. 
In  order  to  facilitate  comparisons,  the  fixed  orientation  has  been  chosen  to  be  identical 
with  that  used  by  Jeziorski  and  van  1  Icmcrt  in  their  original  work  on  the  water  dimer 
(ref.  5).  Calculations  have  been  performed  using  a  substantially  larger  basis  set.  The 
so-called  isotropic  part  of  the  basis  (functions  describing  orbitals  occupied  in  the 
ground-states  of  the  atoms,  see  ref.  12)  has  been  taken  from  ref.  13  and  consists  of  a  set 
of  (13sSp)  and  (6s)  functions  on  the  oxygens  and  hydrogens,  respectively.  This  basis  set 
has  been  extended  with  a  set  of  (2d)  and  (2p)  polarization  functions  on  oxygen  and 
hydrogen  respectively.  The  exponents  were  chosen  in  order  to  minimize  the  dispersion 
as  well  as  the  complementary  exchange  energies  (see  ref.  7).  Exponents  a„=  1  and  0.3  , 
a,  =  0.6  and  0.15  have  been  obtained.  The  complete  contracted  basis  consists  of  94  basis 
functions  for  the  water  dimer. 

The  energy  of  the  water  monomer  calculated  by  using  this  basis  set  equals  -76.06004 
a.u..  The  SCF  binding  energies  obtained  for  the  water  dimer  arc  -3.96  keal/mol  and  -3.73 
keal/mol  without  and  with  the  counterpoise  correction  (CP),  respectively.  The  latter  va¬ 
lue  agrees  very  well  with  the  SCF  limit  of  -3.73  ±0.05  keal/mol  (including  CP  correc¬ 
tion)  recently  estimated  by  Szalcwicz  ei  al  (ref.  14)  using  a  very  large  basis  set  containing 
212  contracted  orbitals.  The  values  of  the  particular  contributions  to  the  interaction 
energy  arc  listed  in  Table  1. 

The  essential  results  to  point  out  arc  the  following  ones: 

1)  the  second  order  exchange-induction  was  found  to  be  quite  important.  At 
equilibrium  distance,  it  compensates  approximative^  for  50%  the  induction  energy.  The 
importance  of  this  contribution  has  been  already  noticed  for  inert  gas  dimers  (refs. 
9,11,15,16). 
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2)  the  second  order  exchange-dispersion  energy  represents  about  20%  of  the  disper¬ 
sion  energy,  thus  confirming  the  non-negligible  role  of  this  contribution. 


Roob 

££ 

^exch 

IT® 

p°> 

*-'dtsp 

‘'tick-ind 

i.<2> 

4.40 

-23.66 

50.31 

-21.25 

-8.90 

14.28 

3.32 

4.80 

-16.68 

24.19 

-9.42 

-5.27 

6.19 

1.51 

5.20 

-10.81 

11.61 

-4.37 

-3.18 

2.70 

0.75 

5.67 

-6.89 

4.85 

-1.82 

-1.79 

0.99 

0.32 

7.00 

-2.67 

0.39 

-0.22 

-0.46 

\06 

0.03 

9.00 

-1.05 

0.01 

-0.03 

-0.09 

0.00 

0.00 

Table  1.  Particular  contributions  to  the  interaction  energy  of  the  water  dimer  (in 
keal/mol)  calculated  with  a  94  AO  basis  set.1 

a.  Basis  set  described  in  the  text. 

b.  Atomic  units. 

It  has  seemed  interesting  to  compare  the  SCF  binding  energy  to  the  sum  of  the 
complete  first-order  and  second-order  induction  energies;  these  values  arc  displayed  in 
table  ( I  (  columns  3  and  2  respectively). 


Roo’ 

+ 

j*SCF 

f*CF,  H2) 

‘  ‘‘dit? 

»; *sct  ,  ,^2) 

*'inl  **  ‘•‘dap 

4- 

4.40 

16.68 

10.97 

11.10 

2.07 

5.39 

4.80 

4.28 

1.08 

0.52 

-4.19 

-2.68 

5.20 

-0.87 

-2.62 

-3.30 

-5.80 

-5.05 

5.67 

-2.87 

-3.73 

-4.34 

-5.52 

-5.20 

7.00 

-2.44 

-2.55 

-2.87 

-3.01 

-2.98 

9.00 

-1.07 

-1.08 

-1.15 

-1.16 

-1.16 

Table  II.  Comparison  of  the  SCF  and  perturbation  theory  interaction  energies  for  the 
water  dimer  (in  keal/mol)  calculated  with  the  94  AO  basis  set. 

a.  Atomic  units. 

b.  Pure  pcrturbational  interaction  energy  calculated  as : 


_  iA I)  ,  HI)  ,  i/tf)  »  ifi)  ,  t,<i)  ,  /.’V 

—  ‘*RS  +  tS  "T  '**  l'la,  *r 


lixeept  at  large  distances,  Table  1 1  clearly  demonstrates  the  non-coincidence  of  these 
two  quantities.  In  fact  the  additional  terms  present  in  the  SCF  binding  energy  (induc¬ 
tion  part  of  third  and  higher-order  Raylcigh-Schrodingcr  terms,  some  intramolecular 
correlation  contribution  introduced  when  doing  a  SCF  supcrmolcculc  calculation 
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(ref.  8))  contribute  in  a  non  negligible  way,. even  in  the  neighborhood  of  the  equilibrium 
geometry.  It  piay  be  noticed  that  these  additional  contributions  become  more  and  more 
important  as  the  intcrmofccular  distance  is  decreased.  One  might  expect  that  the 
difference  between  E^r  and  E01  +  would  be  partly  cancelled  if  in  the  pcrturbational 
approach  .the  induction  part  of  third  and  higher-order  contributions  would  be 
considered.  Thus  presently,  as  a  possibility,  one  may  calculate  the  interaction  energy  by 
adding  to  the  SCF  binding  energy  the  dispersion, term  calculated  within  a  perturbation 
method,  but  in  that  ease  one  has  to  be  cautious  to  also  take  into  account  the 
exchange-dispersion  terms.  Thus  one  has  to  use  the  following  decomposition: 

,  r-(2)  ,  ,-<2) 

'"/nr  ltint  *  l'dtsp  *  l'cxch-dtsp  •  V  *  J) 

'fable  II  shows  that  the  location  of  the  energy  minimum  is  different  following  that  the 
second  order  exchange-  dispersion  term  is  (or  is  not)  taken  into  account.  The  values 
we  have  obtained  arc  RO0=  5.67  a.u.  and  5.20  a.u.  respectively. 


III.  IMPORTANCE  OF  THE  BASIS  SET  QUALITY 
Now,  we  will  pay  some  attention  to  the  important  problem  of  the  quality  of  the  basis 
set  used,  ’fhc  results  listed  in  table  111  clearly  show  that  the  different  components  of 
pertubation  development  arc  quite  sensible  to  the  choice  of  the  basis  set. 


Base 

STO-3G 

4-3 1 G 

6-3IG** 

Jvff’ 

0.1 1" 

/ tfW 

I'RS 

-4.12 

-8.88 

-7.13 

-7.11 

-6.89 

‘Uutk 

1.72 

2.10 

2.60 

4.89 

4.85 

1*mJ 

-0.64 

-0.94 

-1.04 

-1.63 

-1.82 

1  'da? 

-0.20 

-0.42 

-0.93 

-1.54 

-1.79 

l.«) 

0.45 

0.25 

0.33 

0.80 

0.99 

n<» 

0.04 

0.05 

0.13 

0.27 

0.32 

ir" 

-2.75 

-7.84 

-6.04 

-4.32 

-4.34 

Dipole  moment 

1.72 

2.60 

2.18 

2.06 

1.98 

Table  111.  Different  components  of  intcrinolecular  interaction  energies  calculated  within 
different  basis  sets.  All  energy  values  are  given  in  kcal/mol.  fhc  dipole  moments  are 
given  in  Debyes,  (a)  Jvff  stands  for  a  gaussian  basis  (11,7,2/6,1)  contracted  into 
(4,3, 2/2,1),  this  means  70  functions  for  the  dimer  (ref  5).(b)  This  basis  set  containing  94 
AO  for  the  dimer  has  been  discussed  in  section  III.  Results  obtained  using  the  geometry 
of  ref  5  with  R0.0  =  5.67  a.u. 
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1)  The  electrostatic  component  is  well  reproduced  only  if  the  wave  function  of  the 
unperturbed  system  correctly  describes  the  charge  distibution  of  isolated  molecules 
(monomers).  The  calculation  of  multipolar  moments  gives  a  good  criterion  for  the 
quality  of  the  basis  set.  In  this  work  we  have  calculated  the  dipole  moment  of  the  water 
molecule  wihin  the  different  basis  sets  studied,  results  presented  in  Table  3  (last  line) 
have  to  be  compared  to  the  experimental  value  of  1.85  Debyes  (ref.  17).  When  using  the 
very  large  basis  set  (47  AO  for  water  monomer),  we  have  obtained  a  value  which  is  in  a 
very  good  agreement  with  the  ilartrcc-Fock  limit  value  (1.98  Debyes)  estimated  by 
Szalcwicz  ct  al.  (ref.  14).  In  fact,  it  is  well  known  that  at  the  SCF  level,  dipole  moments 
arc  calculated  with  an  error  of  about  10%  because  of  the  lack  of  electronic  correlation 
at  this  level  (ref.  12,14). 

2)  The  exchange  energy  increases  with  the  size  of  the  basis  set  used;  even  6-3 1G**  basis 
set  underestimates  this  contribution.  This  result  proceeds  from  the  imperfect  behaviour 
of  the  wave  function  at  long  range  leading  to  an  underestimation  of  overlap  effects 
between  electronic  clouds  of  different  sub-systems.  It  has  been  noticed  that  nearly  same 
values  have  been  obtained  when  using  basis  sets  including  70  and  94  AO  for  a  water 
dimer. 

It  has  appeared  interesting  to  compare  the  first  order  energy  we  have  calculated  with  the 
so-called  Hcitlcr-London  first  order  energy  defined  as: 


Lm.  - 


<Tq|//|T„> 

<T0|T0> 


(14) 


where  T0  is  the  antisymmetrized  wave  function  calculated  with  the  exact  wave  function 
of  the  two  monomers, 

T0  =  A  <!>„  =  A  Tq  'I'"  (15) 


and  11  the  total  Hamiltonian  of  the  system  in  interaction.  In  (in  and  only)  that  ease  the 
llcitler  London  energy  may  be  written,  as: 


/.•O)  _  /,■  i 

‘‘111.  -  '-o  + 


<(D0I  vAB  |  A<I>U  > 
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(16) 


where  is  the  eigenvalue  of  //„,  and  the  second  term  of  this  equation  is  nothing  else  that 
the  total  first  order  perturbation  component.  In  practice  we  do  not  use  exact  wave 
functions  for  calculating  isolated  molecules,  so  Iiq.  1 6  is  written  as: 


,.(l)  t-  .  <  <I>o  I  (ft0  ~  //o)  A'  1  >  , 

A"' • "  /lQ  +  <  <!>0  |  A  + 
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(17) 


where  /:„  represents  the  mean  energy  associated  to  approximate  wave  functions  of  the 
monomers.  The  last  term  of  the  second  member  of  liq.  17  is  the  first  order  perturbation 
contribution  including  both  Raylcigh-Schrodingcr  and  exchange  terms,  the  second  term 
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(second  member)  called  complementary  exchange  energy  represents  a  correction  which 
is  zero  when  'F0  is  the  exact  eigen  function  of  //„  .Now,  the  I  Icitlcr-London  energy  is  now 
decomposed  into: 

41 =^o+ 41 + 4L + 4L-co.pt  08) 


We  may  denote: 

/.-O)  _  ,  ,.-<•) 

l'exch—llL  l'exch  T  exch-compl 


(19) 


The  value  of  the  complementary'  exchange  energy  is  a  good  mean  to  check  the  quality 
of  the  basis  set.  (  For  more  details  see  ref.  7,  17).  Table  IV  displays  the  calculated  values 
of  /j®*  and  I?"tK It  appears  that  only  very  large  basis  sets  (at  least  70  AO  for  a  water 
dimer)  lead  to  quite  correct  values  of  first  order  exchange  contributions. 


Basis 

STO-3G 

4-3 1G 

6-31G** 

JvII* 

o.ir 

t'txth 

1.72 

2.10 

2.60 

4.89 

4.85 

*  'tich-compl 

1.74 

1.48 

1.12 

-0.32 

-0.17 

3.46 

3.58 

3.72 

4.57 

4.68 

’fable  IV.  Dcpcndancc  of  the  contributions  /C, ,  4L-hl  with  the  basis  set. 
For  (a)  and  (b)  see  comments  in  "fable  III.  All  values  (in  keal/mol)  have  been  calculated 
for  the  geometry  of  ref.  5  and  for  R0.0  =  5.67  a.u. 


3)  The  total  induction  and  dispersion  terms  arc  correctly  taken  into  account  only  if  very 
large  basis  sets  arc  used.  We  may  notice  that  the  basis  consisting  in  94  AO  which 
includes  neither  f-orbitals  on  oxygen  nor  d-orbitals  on  hydrogen  leads  to  a  value  of  the 
dispersion  term  which  is  very  close  to  the  limit  value  estimated  by  Szalcwicz  ct  al. 
(-2.kcal/mol)  (ref.  14) 


[V.  DERIVATION  OF  SHMI-HMPIRICAL  FORMULA!;  FOR  SECOND  ORDFR 
DISPERSION  CONTRIBUTION 

These  two  contributions  have  been  calculated  as  a  sum  of  atom-atom  interactions : 


Ijm  (20) 

ie  A  je  II 

where  i  (and  j)  arc  atoms  belonging  to  molecule  A  (and  B);  the  subscript  X  stands  for 
dispersion  or  exchange-dispersion. 


This  contribution  is  represented  by: 


=  -  ( ~r + -F + -nr  )k,kj 


(21) 


with  z=  RJ R°  an<l  K  ~  [(2/C)(2^/‘)]'P  where  /?“  and  R~  arc  the  van  dcr  Waals  radii 
of  atom  i  and  j.  R  is  the  distance  between  atoms  /  and  j  .  Factors  k„  k,  allow  the  energy 
minimum  of  I;  =  V.Av  +  I-ckA ,  to  have  different  values  according  to  the  atomic  species 
involved  (ref.  21). 

Coefficients  Q,  Q,  and  C,0  have  been  calculated  by  identifying  Fq.  (21)  with  the  one 
given  by  Stogryn  cl  al.  (ref.  19)  for  Hc...llc  interaction: 


/.-F)  _  _  r 
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1.471  .  14.1 


R 
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182.0 

„io 


(22) 


Namely:  Ct  =0.143  kcal  /l‘/mol,  C,  =  0.0381  Real  zl'/mol,  Cl0  =  0.0137  Real  -d‘7mol. 
'flic  terms  besides  CJR 6  arc  not  negligible  in  the  equilibrium  distance.  For  He... He,  for 
instance,  these  two  terms  amount  to  1/3  of  the  main  term  —CJR*. 

But  it  is  well  known  that  the  multipolar  part  (  —  1//C)  of  the  dispersion  energy 
overestimates  it  at  short  distances  owing  to  the  neglect  of  the  penetration  part  of  the 
intcrmolecular  integrals  which  appear  in  the  numerators  of  the  pertubation  expansion 
(for  more  details  see  ref.  20  and  references  therein).  In  order  to  take  into  account  the 
reduction  of  different  multipolar  terms  ,  we  have  applied  the  process  defined  by  Caillct 
cl  al.  (ref.  21)  when  dealing  with  the  sixth  order  power  term  of  dispersion.  Namely,  we 
choose  two  distances,  R„  =  R"  +  Rj'  and  R„  =  -  R„.  Then  for  R  >  RM  we  use  the 

normal  parameters  Cn  (n=6,  8,  10),  for  R  <  R„  we  use  modified  reduced  parameters  C’„ 
and  for  R„  <  R  <  Rm  we  use  interpolated  values  of  these  parameters  according  to: 


C„(.v)  =  C  n)  +  (0.375.V5  -  1.25a:3  +  1.875.v)  -C--  ^  - 


(23) 


where: 

rn  (Ku+'U  nir  (Ru-RJ  , 

x  =  [R - 2 - 3/C - 2 - 3 

The  polynomial  P(x)  has  been  chosen  in  order  that:  a)  P(l)  =  1  and  I’(-l)  =  -1; 
b)  the  first  and  second  derivatives  of  P(x)  are  continuous. 

C\  =  Q/6.25  ;  C\  =  Q/7.38  and  C'10  =  CJ  10.44. 

Using  the  geometrical  arrangement  studied  by  Je/.iorski  et  al.  (ref.  5)  and  varying  only 
Ro_0,  l$lf  has  been  fitted  with  regards  to  values  calculated  by  SAP  T  method. 


I 


It  has  been  found  that  this  component  which  is  purely  short-range  varies  exponentially 
with  the  distance  R.  The  best  fit  has  been  given  by  the  following  analytical  function 


,.<2) 

1‘ exch-disp 


=  ktkj{\- 


Qi_ v,  ~ 

ml  X  y,val 


Nj 


(24) 


where  <2,  (x  =  i  or  j)  is  the  net  charge  of  atom  x  and  A'"'  the  number  of  valence  electrons 
of  atom  x. 

In  the  same  way,  as  recommended  by  Caillct  et  al.  (ref.  21) ,  we  have  used  the  factor 
(  -Q,)IK“  corresponding  to  the  influence  of  the  real  electronic  population  of  each  atom 
on  short  range  terms. 

C  =  484.98  keal/mol  and  a  =  9.18. 

3)  Results. 


Roo 

4.50 

-20.87 

-7.65  (-7.80) 

2.57  (2.92) 

4.80 

-13.88 

-5.04  (-5.27) 

1.52(1.51) 

5.00 

-8.73 

-3.98  (-4.10) 

1.07  (1.07) 

5.20 

-5.77 

-3.20  (-3.18) 

0.75  (0.75) 

5.67 

-2.34 

-1.91  (-1.80) 

0.33  (0.32) 

6.00 

-1.47 

-1.33  (-1.22) 

0.19(0.18) 

7.00 

-0.42 

-0.42  (-0.47) 

0.03  (0.03) 

Table  V.  Values  (in  keal/mol)  calculated  with  simplified  formulas,  and  by  ab  initio 
SAPT  method  (in  parenthesis),  for  notations  see  the  text. 


In  Table  V,  we  have  listed  the  values  of  the  multipolar  part  of  the  second  order  disper¬ 
sion  denoted  H*„.„  (Rq.  21),  the  values  obtained  for  I when  the  penetration  part  is 
taken  into  account  and  the  value  of  second  order  exchange-dispersion  terms  ,  the 
values  between  brackets  arc  the  ones  calculated,  within  SAPT  method  (6).  It  may  be 
noticed  that  even  in  the  equilibrium  region  the  taking  in  account  of  only  the  multipolar 
part  of  dispersion  overestimates  this  contribution.  The  agreement  between  values 
calculated  using  simplified  formulae  or  SAPT  method  is  quite  good  in  the  ease  of  second 
order  exchange-dispersion  energy  and  good  enough  as  concerns  the  second  order  dis¬ 
persion  term. 
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IV.  CONCLUSION 

At  this  point,  it  is  important  to  emphasize  that  the  goal  of  this  work  was  not  to  obtain 
a  very  accurate  value  of  the  interaction  energy  between  two  water  molecules,  since  it  is 
clear  that  for  such  a  simple  system,  the  supermolccule  approach  based  on  very  large  Cl 
(refs.  10,11)  arc  preferable.  Actually,  one  of  the  basic  motivation  of  our  work  was  to 
demonstrate  the  nonncgligiblc  role  of  the  complete  second-order  exchange  contribution 
(exchange  induction  as  well  as  exchange-dispersion  components).  Furthermore  the 
ability  to  determine  quantitatively  the  importance  of  each  component  of  the  total  inter¬ 
action  energy  has  opened  the  way  towards  representing  them  through  simple  analytical 
functions  fitted  on  calculated  values.  It  has  appeared  that  such  a  fitting  has  to  be  done 
departing  from  results  obtained  within  a  very  large  basis  set.  But  as  it  has  been  discussed 
in  this  paper  the  induction  part  of  third  (and  perhaps  higher  orders)  contributions  should 
be  considered.  In  fact,  these  contributions  may  be  obtained  from  SCF  results.  As  a  first 
step  simplified  formulas  for  the  calculation  of  the  second-order  dispersion  (  including 
exchange-dispersion)  terms  have  been  given.  Work  is  under  progress  in  order  to  verify 
other  contributions  (  mainly  first  order  exchange  energy)  of  interaction  energy.  Such  a 
possibility  is  essential  with  respect  to  the  problem  of  elaborating  high  quality  simplified 
functions  for  the  calculation  of  the  interaction  energy  between  arbitrarily  large 
molecules. 
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WATER  MOLECULE  IN  DIFFERENT  CRYSTAL  SURROUNDINGS: 

PERIODIC  HARTREE-FOCK  AND  MODEL  HAMILTONIAN  CALCULATIONS 

J.  G.  ANGYAN  and  B.  SILVI 

Laboratoire  dc  Spectrochimie  Molecuhire.  (VA  508),  University  Pierre  et  Marie  Curie, 

4,  Place  Jussieu,  75230  PARIS  Cedcx.  (France) 

ABSTRACT.  Tiie  effect  of  surroundings  m  molecular  crystals  has  been  studied  at  t  liree  levels  ol  approximation: 
by  periodic  Hartree-Fock  method,  by  self-consistent  Madclung  potential  approach  and  by  the  usual  cluster 
(prototype  molecule)  model.  The  calculations  performed  on  different  proton-oidered  ice  modification.-  and  on 
lithium  hydroxide  monohydrate  show  that  the  SCMP  method  provides  an  electron  distribution  in  very  good 
agreement  with  the  periodic  Hartree-Fock  results,  while  finite  size  ellects  induce  considerable  discrepancies 
within  the  cluster  approach. 

1.  Introduction 

It  is  a  well-established  fact  that  intcrmolccufer  forces  may  have  sometimes  considerable  effects  on 
the  properties  of  individual  molecules,  it  is  especially  true,  when  one  is  dealing  with  molecules  in 
a  condensed  phase  matter,  like  molecular  liquids  or  molecular  crystals.  Whereas  we  have  to  our 
disposition  quite  reliable  quantum  chemical  methods  to  describe  molecular  observables  of  tlm  isolated 
species,  or  small  aggregates,  there  are  no  standard  methods  for  the  study  of  the  condensed  phase 
intermolecular  interactions  on  the  electronic  structure. 

In  the  recent  years  a  powerful  technique,  the  periodic  Hartree-Fock  method  has  been  developed  for 
the  description  of  three-dimensional  solids  in  terms  of  Bloch-orbitals.  It  offers  a  unio.ue  tool  to  study 
especially  covalent  and  ionic  crystals.  Nevertheless  it  has  its  practical  limitations  for  systems  with 
large  unit  cells  (more  than  about  25  atoms).  This  limit  is  quite  easily  attained  in  the  case  of  molecular 
crystals. 

An  alternative  approach  i«  offered  by  model  Hamiltonian  methods,  where  the  building  blocks  of 
the  crystal  are  described  by  taking  into  account  the  effective  potential  of  the  surroundings.  The 
self-consistent.  Madelung  potential  (SCMP)  approach  corresponds  to  a  specific  form  of  this  effective 
potential,  involving  the  electrostatic  interactions,  which  dominate  in  strongly  polar  systems.  Although 
the  principles  of  the  SCMP  method  arc  quite  easy  to  understand  heuristicallv,  there  are  a  number  of 
approximations,  which  could  be  justified  only  by  experience. 

In  the  present  contribution  we  undertake  a  comparison  of  exact  periodic  Hartree-Fock  and  ap¬ 
proximate  SCMP  results  on  the  charge  distribution  in  polar  molecular  crystals.  In  particular  the 
charge  distribution  of  the  water  molecules  has  been  followed  in  various  conditions,  like  different  ice 
modifications  and  crystal  hydrates. 

After  a  brief  review  of  the  main  principles  of  the  periodic  Hartree-Fock  and  the  SCMP  approches, 
the  most  important  features  of  the  present  implementations  aie  described.  The  cohesion  energies, 
and  the  atomic  net  charges  are  discussed  and  compared  for  proton  ordered  hexagonal  and  cubic  ice 
modifications,  as  well  as  for  LiOH.HjO,  lithium-hydroxide  monohydratc. 

2.  Theory 

2.1.  Periodic  Hartree-Fock  method.  The  Hartree-Fock-Roothan  equations  for  periodic  systems  are 
well  known  (l-2j  and  are  recalled  here  only  for  a  better  understanding.  The  crystalline  orbitals  V.  (k; r) , 
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are  represented  as  linear  combinations  of  Bloch  functions  y?„(k;r)  which  are  expressed  on  the  basis  of 
atomic  orbitals  of  each  unit  cell: 

tMk;r)  =  ]Cc‘l'*(k)'1’-‘lkir)  4) 


V5u(k:r)  =  JV‘/=  Y x5W exp (tk  ■  g) 


(2) 


In  Eq.  (2)  x5  denotes  the  u,th  atomic  orbital  of  the  unit,  cell  characterized  by  the  direct  lattice  vector  g 
The  expansion  coefficients  cu,  (k)  are  calculated  by  solving  for  each  reciprocal  vector  k  in  the  Briilouin 
zone  the  matrix  equation: 

F(k)C(k)  =  S(k)C(k)£(k)  (3) 

in  which  5(k)  is  the  overlap  matrix,  £(k)  is  the  eigenvalue’s  diagonal  matrix  and  F{k)  is  the  Fock 
matrix: 


F(k)  =■  £exp(tk  g)FE 


(4) 


Formally  the  Fe  matrix  elements  can  be  written  as  the  sum  of  one-electron  and  two-electron  con¬ 
tributions  i.e.: 

=  (X°\F\\V  =  +  Gfn-  (5) 


in  which  the  dummy  indices  label  the  atomic  orbitals  in  the  elementary  cell.  The  one-electron 

(6) 


contribution  is  the  sum  of  the  kinetic  T*.  and  nuclear  attraction  V,*.  terms: 

The  two-electron  term  is  the  sum  of  a  Coulomb  and  of  an  exchange  term: 

=  j*,, + k*.  =  y  pi  Dix^ixrxr1)  -  ^xr^xr1)!  (?) 


A  <».l  m 


The  1  and  m  summations  in  eq.  (7),  as  well  as  the  g  one  in  eq.  (-1),  extend  in  principle  to  the  infinite  set 
of  lattice  vectors,  in  practice  the  convergence  in  the  transformation  from  Fs  to  F(k)  is  very  fast,  and 
the  important  exchange  contributions  can  be  shown  to  be  relatively  short  ranged  [5].  The  Coulomb 
series  require  on  the  contrary  particular  attention,  the  electron-electron  long  range  terms  must  be 
combined  with  the  corresponding  electron-nuclei  contributions  and  summed  to  infinity  by  Fwald-type 
techniques  |6). 

In  an  all-electron  approach  the  electron-nuclei  term  takes  the  following  form: 


v,t  =  / X°(r)  YU  Z*\T  ~  R''  “  hl'‘xS(8)* 

J  h  A 


(3) 


where  the  A  summation  extends  to  the  atoms  of  the  unit  cell  with  charge  ZA.  This  purely  local 
contribution  is  treated  by  standard  Ewald  techniques. 

The  density  matrix  elements  are  obtained  after  integration  over  the  volume  of  the  Briilouin  zone: 


K-  =  2  [  dk  exp  (ik  l)^c;„(k)c„,(k)i?(Q- -  c,(k)) 
DZ  1 


(9) 


In  Eq.  (9)  the  c,„  are  the  eigenvectors  elements,  #  is  the  step  function,  (p  is  the  Fermi  level  and  t,  is 
the  ith  eigenvalue.  Finally  the  total  electronic  energy  per  unit  cell  is  given  by: 


(10) 
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The  program  CRYSTAL  |7j,  developed  in  Torino,  is  an  ab  initio  program,  which  allows  calculations 
on  one-,  two-  and  three-dimensional  periodic  systems.  In  CRYSTAL  standard  cartesian  gaussian 
functions  are  used  to  construct  the  Bloch  functions  defined  by  Eq.  (2).  The  program  requires  as 
input  the  space  group  of  the  system  to  be  calculated, 'the  fractional  coordinates  of  the  independent 
atoms,  the  basis  set  and  the  threshold  values,  used  for  the  truncation  of  the  infinite  sums.  As  output 
it  provides  the  unit  cell  energy,  the  wavefunction  and  related  one  electron  properties. 

2.2.  Self- consistent  Madclung  potential  (SCMP)  method  jlOj.  In  this  approach  one  is  interested 
in  the  electronic  wavefunction  of  a  molecular  subunit  of  the  crystal.  Following  the  philosophy  of  the 
group  function  method  of  McWeeny  (8-9J,  a  set  of  coupled  equations  can  be  written  down  for  the  N 
subunits  of  the  crystal: 

,V 

{H"  -e  £  =  £i<F>  (11) 

where  the  summations  run  over  the  space  group  operations  (rototranslations)  and  over  the  lattice 
translations  g.  'I'  stands  for  the  wavefunction  of  the  motif  in  the  g  =  0  unit  cell.  H"  is  the  Hamilton 
operator  of  an  isolated  subunit  and  V  is  the  interaction  operator  between  different  subunits.  This 
interaction  includes  intersystem  nuclear  repulsion,  electron-nuclear  attraction  and  electron-electron 
Coulomb  and  exchange  operators.  These  equations  can  be  decoupled  by  taking  into  account,  the  space 
group  symmetry  relationships  between  the  subunit  wavefunct.ions.  By  denoting  ,?,(g)  the  appropriate 
combination  of  the  rototranslation  and  lattice  translation,  and  by  transforming  the  V  interaction 
operator  accordingly,  one  can  define  the  following  effective  interaction  kernel  function: 

s  =  ££^(s>''*'(g)  (i2) 

8  l 

The  SCMP  effective  Sc’nrodinger  equation  for  one  subsytem  takes  the  simple  form: 

{//"  -r  (4'iC|4'))j4’>  =  £|'J')  (13) 

This  is  a  typical  nonlinear  Schrodinger  equation,  similar  to  the  one  used  in  the  reaction  field  models 
of  solvent  effects  (111. 

In  order  to  complete  the  SCMP  theory  we  have  to  define  the  energy  per  subunit  in  the  crystal.  It  is 
easy  to  see  that  the  sum  of  the  energies  E,  appearing  in  the  effective  Schrodinger  equations  includes 
the  subunit-subunit  interactions  twice.  Since  the  subunits  are  equivalent,  it  is  straightforward  to 
define  the  energy  per  subunit,  £  as: 

£  +  (n) 

As  we  have  already  shown  in  ref  (lOj,  the  effective  Schrodinger  equation  (13)  can  also  be  derived  by 
applying  the  variational  principle  to  the  above  energy  functional,  so  by  solving  (13)  we  make  stationary 
the  approximate  energy  functional  (14). 

3.  Method 

8.1.  Evaluation  of  the  Coulomb  series  in  CRYSTAL.  In  CRYSTAL  the  direct  space  approach  is 
used  to  perform  the  evaluation  of  the  necessary  one  and  two  electron  integrals.  The  difficult  problem 
of  summing  the  infinite  series  of  the  electrostatic  interactions  appearing  in  Eqs.  (7-8)  has  been  solved 
in  the  following  way:  consider  the  exact  charge  density  pK(r)  of  the  general  crystal  cell  labelled  by 
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the  direct  lattice  vector  g.  It  can  be  split  into  a  model  point  charge  distribution  involving  atomic 
multipoles  up  to  the  L"'  order  and  a  remainder  so  that  the  total  charge  density  can  be  written  as: 

p(r)  =  £  P . iG(r)  +  />'s(r)  =  , (r)  +  />'(r)  (15) 

s 

On  the  one  hand  the  choice  of  /  ”""'(r)  implies  that  its  contribution  to  the  potential  in  the  reference  unit 
cell  can  be  efficiently  computed  by  the  Ewald  technique.  On  the  other  hand  the  contribution  of  the 
remaining  charge  distribution  p'(r)  is  short  ranged  and  therefore  it  is  evaluated  by  a  direct  summation 
over  the  unit  celis  located  within  a  sphere  of  finite  radius  the  size  of  which  can  be  controlled  by  the 
user 

The  exchange  contribution  is  essentially  short  ranged  and  except,  for  metallic  systems  the  conver¬ 
gence  of  the  series  is  fast.  A  set  of  cutoff  parameters  is  used  to  control  the  length  of  the  expansion. 
A  detailed  discussion  of  these  problems  can  be  found  in  ref.  [6j  together  with  the  relevant  algorithms 
used  in  actual  calculations. 


8.2.  Ab  initio  implementation  of  the  SCMP  method.  For  ideal  periodic  systems  the  effective  in¬ 
teraction  kernel  §  involves  a  combination  of  infinite  lattice  sums  of  one-  and  two-electron  integrals. 
Obviously  the  brute  force  calculation  of  such  integrals  would  be  excessively  expensive.  Taking  into 
account  the  inherently  approximate  nature  of  the  SCMP  method  it  seemed  to  be  desirable  to  introduce 
some  conceptually  simple  approximations  for  the  representation  of  the  interaction  integrals. 

We  have  choosen  the  ltuedenberg  approximation  to  decompose  the  multicenter  integrals  to  at  most 
two-center  ones.  As  a  further  approximation,  we  developed  these  integrals  into  truncated  multipole 
series,  which  leads  to  an  atomic  multipole  expansion  of  the  interaction  operator.  This  philosophy  is 
quite  clo=e  to  that  of  the  CRYSTAL  algorithm.  By  this  procedure  the  Madelung  problem  is  reduced 
to  the  analytical  calculation  of  multipole  Ewald  sums  involving  the  pairs  of  nuclear  positions.  The 
atomic  charges  in  this  scheme  are  the  usual  Mulliken  charges. 

QA  =  zA  -  Z(psh,n  (»«) 

/i£A 

while  the  higher  atomic  multipoles  are  defined  by  a  generalization  of  the  above  expression.  For 
example,  the  z-component  of  the  atomic  dipole  moment  is  given  by: 

=  -X>,au-  (in 

I  >£A 

In  this  expression  the  matrix  M,  is  defined  by  its  elements: 

(Afr) i./i  =  ^  )  (18) 

\.<r£A 

Actually  the  method  is  working  with  a  multipole  expansion,  truncated  at  the  dipolar  level.  In 
this  dipolar  approximation  one  obtains  the  following  expression  for  the  corrected  SCMP  Fock  matrix 
elements,  within  the  framework  of  the  closed  shell  RHF  theory: 

F,o-  =  C  -  \s,.r(VA  +  v*)  +  i(M(I,.Ep  +  M„,,E'')  (19) 

In  this  latter  expression  VA  and  E4  stand  for  the  total  Madelung  potential  and  Madelung  field  at  the 
atomic  center  A,  coming  from  the  surrounding  atomic  charges  and  dipoles.  The  notation  M,,,,- stands 
for  the  three  Cartesian  components  of  the  dipole  moment  matrices,  A/,,  My  and  AA. 
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4.  Applications 

Two  systems  in  which  the  water  molecule  can  be  considered  as  a  probe  have  been  studied.  On 
the  one  hand  is  ice  with  different  structures  and  on  the  other  hand  the  hydrate  of  lithium  hydroxide 
LiOH.HnO.  The  periodic  Har tree- hock  calculations  can  be  considered  as  a  reference  in  order  to  gauge 
the  results  given  by  the  model  hamiltonian  SCMP  method  as  well  as  those  obtained  from  the  prototype 
molecule  (cluster)  approach. 

Two  structures  of  icc  have  been  considered  here.  First,  the  hexagonal  ice  which  belongs  to  the 
PGJmmc  space  group  as  far  as  ’’half  hydrogens"  are  disposed  in  position  (4/)  corresponding  to  the 
statistically  disordered  orientations  of  the  water  molecules.  In  actual  calculation  one  has  to  consider 
some  proton-ordered  structures  containing  exclusively  ”full  hydrogen”  atoms.  Two  possible  substruc¬ 
tures  of  lower  symmetry,  hereafter  referred  as  ’’ortho'1  and  ’’para”  hexagonal  ices,  were  considered  in 
this  work.  In  the  ’’ortho”  structure  the  water  molecule  dipoles  belonging  to  a  given  sheet  are  forming 
an  angle  of  60  degrees,  while  in  the  ’’para”  structure  the  water  dipoles  are  parallel. 


'  AE  (kJ  mol-1) 

qO  (a.u.) 

’’ortho”  ice 

PHF  (STO-3G) 

-56.7 

-0.464 

PHF  (6-3 1G) 

-68.0 

-1.109 

SCMP  (STO-3G) 

-33.4 

-0.452 

SCMP  (6-31G) 

-115.0 

-1.141 

cluster  (STO-3G) 

-33.1 

-0.419 

cluster  (6-31G) 

-57.8 

-0.895 

’’para”  ice 

PHF  (STO-3G) 

-56.9 

-0  464 

PHF  (6-31G) 

-68.4 

-1.109 

SCMP  (STO-3G) 

-35.6 

-0.459 

SCMP  (G-31G) 

-119.0 

-1.153 

cluster  (STO-3G) 

-12.1 

-0.456  (-0.402) 

cluster  (6-31G) 

-23.4 

-0.954  (-0.866) 

’’cubic”  ice 

PIIF  (STO-3G) 

-58.9 

-0.467 

PHF  (6-31G) 

-74.6 

-1.114 

SCMP  (STO-3G) 

-40.8 

-0.461 

SCMP  (6-3 1G) 

-127.3 

-1.161 

cluster  (STO-3G) 

-17.1 

-0.458  (-0.369) 

cluster  (6-31G) 

-17.0 

-0.953  (-0.838) 

|  5  Table  1.  Cohesive  energy  AE  and  oxygen  net  charges  qO  in  three  model  ices. 
I  >  PHF:  periodic  Hartree-Fock;  SCMP:  self-consitent  Madelung  potential; 

f  \  in  parentheses:  oxygen  net  charge  of  singly  hydrogen  bonded  molecule. 
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The  second  ice  modification,  experimentally  observed  at  low  pressures,  is  known  as  ’’cubic”  ice.  Its 
structure  is  related  to  that  of  high-cristobalite.  Half  hydrogens  are  not  necessary  to  preserve  the  full 
symmetry  of  the  P4j2i2  space  group. 

The  results  of  the  calculations,  performed  with  both  STO-3G  and  6-21 G  basis  sets,  are  listed  in 
Table  1.  The  unit  cell  of  hexagonal  ice  contains  twelve  atoms  belonging  to  six  water  molecules,  which 
are  shared  between  several  adjacent  cells.  Within  this  arrangement  it  is  not  possible  to  perform  a 
SCMP  calculation  in  which  the  unit  cel!  atoms  are  taken  as  quantum  motif.  In  the  SCMP  calculations 
on  hexagonal  ice  reported  here,  the  unit  cell  is  made  of  one  water  molecules  surrounded  by  the  atomic 
charges  and  multipoles  of  the  remaining  atoms  of  the  cell.  The  two  clusters  simulating  hexagonal  ice 
are  made  of  rings  of  six  molecules  extracted  from  the  ortho  and  para  unit  cells.  In  the  ortho  cluster 
the  six  molecules  are  equivalent  and  'the  total  dipole  moment  is  zero.  In  the  para  cluster  there  are 
basically  two  kinds  of  molecules  differing  by  the  number  of  hydrogen  bonds  in  which  they  are  involved. 
The  para  cluster  has  a  nonzero  permanent  dipole  moment. 

The  band  structures,  not  reported  here,  are  very  flat  compared  to  those  of  covalent  or  metallic 
crystals  (cf.  ref.  [12]).  This  means  that  the  k-space  integration  appearing  in  the  evaluation  of  the 
density  matrix  (Eq.  9)  can  be  safely  approximated  by  a  discrete  sum  over  the  occupied  states  In 
other  words  this  means  that  a  localized  approach  is  physically  consistent  for  such  systems. 

The  interaction  energy  computed  by  the  periodic  Hartree-Fock  method  contains  the  electrostatic, 
induction  and  exchange  contribution  but  does  not  account  for  the  dispersion  forces.  Nevertheless, 
the  cohesive  energies  calculated  at  this  level  with  both  basis  sets  have  the  correct  order  of  magnitude 
(the  experimental  binding  energy  is  about  55  kJ  mol-1)  [13-16]  and  are  almost  structure  independent 
though  the  cubic  ice  appears  to  be  overstabilized.  They  are  noticeably  underestimated  in  cluster 
calculations  and  show  large  discrepancies  between  the  three  structures  which  can  be  interpreted  in 
terms  of  non-equivalence  of  the  water  molecules  due  to  the  limited  size  of  the  cluster.  The  SCMP 
method  does  not  take  into  account  the  repulsive  forces  and  therefore  the  interaction  energy  is  expected 
to  be  overestimated  with  respect  to  the  periodic  Hartree-Fock  value.  In  fact,  this  occurs  with  the  6- 
31G  basis  set,  while  STO-3G  calculations  seem  to  underestimate  the  cohesion  energy.  Nevertheless, 
it  is  remarkable,  that  the  order  of  stability  of  the  three  ice  modification  is  the  same  in  both  the  PHF 
and  SCMP  calculations. 


periodic  IIF 

SCMP 

cluster 

qLt  a.u. 

0.623 

0.700 

0.860 

</0(0H) 

-0.636 

-0.800 

-0.693 

?i/(OH) 

0.068 

0.100 

-0.084 

,0(H20) 

-0.50-1 

-0.466 

-0.619 

qH{  I120) 

0.225 

0.233 

0.227 

Table  2.  Charge  distribution  in  of  lithium  hydroxide  monohydrate.  The  charges  on  the  LiOH  moiety 
were  kept  fixed  in  the  SCMP  calculation. 

The  meaning  of  the  atomic  net  charges  in  the  sense  of  Mulliken  is  physically  questionable  and  it  is 
known  that  these  quantities  are  very  sensitive  to  the  nature  of  the  basis  set.  In  the  case  of  this  scries 
of  calculations  in  which  the  same  basis  set  has  been  used  to  compute  the  same  system  at  different 
levels  of  representation  of  the  surroundings  it  appears  to  be  a  suitable  tool  for  the  analysis  of  the 
wavefunctions.  Like  the  cohesive  energy,  the  oxygen  net  charge  is  essentially  basis  set  dependent.  The 
periodic  Hartree-Fock  and  SCMP  values  are  in  excellent  agreement  while  the  finite  size  of  the  clusters 
is  responsible  for  the  underestimation  of  this  quantity. 
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The  structure  of  Li0H.H20  has  been  determined  from  X-ray  experiments  by  Hermansson  and 
Thomas  jl7),  the  unit  cell  belong  to  the  C2/m  space  group  and  contains  24  atoms.  The  nearest  Li+ 
and  OH-  ions  are  located  in  different  unit  cells  and  therefore  the  SCMP  study  can  not  be  directly 
performed  on  the  unit  cell.  The  SCMP  calculations  were  done  for  the  H20  motif  in  the  fixed  charge 
distribution  of  the  Li+  and  OH"  ions,  represented  by  point  charges.  The  rather  large  size  of  the  unit 
cell  does  not  make  possible  to  carry  out  periodic  Hartree-Fock  calculations  of  this  system  with  a  split 
valence  basis  set  and  therefore  only  STO-3G  results  are  reported  in  table  2. 

The  agreement  between  the  PHF  and  SCMP  charge  distributions  is  less  spectacular  than  for  the 
ice  modifications.  This  slight  discrepancy  can  be  attributed  to  the  fact,  that  the  LiOH  part  of  the 
system  was  not  relaxed.  In  the  SCMP  model  the  charge  transfer  between  the  H20  and  LiOH  motifs 
is  not  possible,  while  it  seems  to  be  non-negligible  according  to  the  PHF  (and  cluster)  calculations. 
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SUMMARY 

SCF  6-3 1G*  level  calculations  arc  presented  on  the  structures  and  relative  energies  of  the 
neutral  and  zwittcrionic  forms  of  glycine  and  its  monohydrate.  Major  differences  arc  found 
between  those  structures  reported  here  and  previous  published  work. 


INTRODUCTION 

The  relative  energies  of  the  neutral  (GN)  and  zwittcrionic  (GZ)  forms  of  glycine  (G)  and 
other  amino  acids  have  long  fascinated  experimental  and  theoretical  chemists.  The  fact  that 
the  GN  form  is  more  stable  than  the  GZ  form  in  the  gas  phase  can  be  rationalized  from  the 
relative  gas  phase  proton  affinities  (PA)  of  alkylamincs  (GENIE  =  214  keal/mol)  and 
alkylcarboxylatc  anions  (QIjCCL(-)  =  349  keal/mol)  (refs.  1,2).  Without  a  coulombic  cor¬ 
rection  term  for  the  attraction  between  the  charged  moieties,  the  following  gas  phase 
reaction  would  be  approximately  endothermic  by  1 35  keal/mol. 

RNII2+  RCOOII  =  RNII3(  +  )  +  RCOO(-) 

However,  in  the  case  of  GZ,  the  separation  of  the  positive  and  negative  charges  (ca  2.4-2. 7 
ang)  would  produce  a  stabilizing  energy  on  the  order  of  between  120-135  keal/mol.  The  net 
results  is  that  the  GZ  form  is  predicted  to  be  less  stable  than  the  GN  form  by  0-15  keal/mol. 
This  estimate  is  quantitatively  unreliable  but  indicates  that  the  GZ  form  is  probably  less 
stable  than  the  GN  form  by  not  more  than  30  kal/inol  in  the  gas  phase.  This  energy 
difference  is  inverted  in  polar  solvent  solutions  or  in  those  crystal  configurations  which 
stabilize  the  GZ  more  than  the  GN  form  (ref.  3,4).  In  water,  an  estimate  of  the  free  energy 
difference  can  be  made  from  the  pK,  differences  between  protonated  alkylamincs  and  neutral 
carboxylic  acids  (4-5  units).  This  estimate  indicates  that  at  neutral  pi  Is,  the  zwittcrionic 
form  (GZ)  dominates  in  water.  However,  more  than  one  species  exists  in  appreciable 
concentrations  than  can  exist  in  the  gas  phase.  These  species  arc,  respectively,  the 
carboxylatc  and  amino  protonated  species:  G  (-),  (-)OOC-CI I2-NI I1%  and  Gll(-t-), 
I10()C-CIb-NIIj(i- ). 

The  problem  of  theoretically  treating  the  energy  difference  between  the  GZ  and  GN 
species  in  the  gas  and  solution  phase  has  has  been  approached  by  various  investigators  in 
recent  years.  We  won't  cite  earlier  CNDO-INDO  work  since  these  methods  were 
intrinsically  incapable  of  yielding  reliable  quantitative  work.  More  reliable  scmicmpirical 
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methods  have  now  been  developed  which  will  certainly  be  used  in  molecular  modeling 
schemes,  c.g.  AMI  (ref.  5).  However,  it  is  still  not  clear  just  how  accurate  these  will  be  for 
modeling  water-neutral  or  water-charged  molecule  interactions  (ref  6).  Most  of  the  recent 
studies  on  GN  and  GZ  have,  however,  been  done  using  ab  initio  methods  (refs.  7-10)  even 
though  a  recent  IN' DO  study  is  reported  (ref.  11)  on  a  related  system.  Although  the  ex¬ 
pression  "ab  initio"  gives  the  impression  of  quantitative  reliability,  computations  on  this  size 
of  system  can  not  be  done  at  the  near  Hartrce-Fock  limit  level  and  fully  corrected  for 
correlations  effects.  Ab  initio  computations  on  polyhydrated  GN  and  GZ  species  require 
use  of  modest  basis  sets  and  neglect  of  correlation  energy  corrections.  Since  the  GZ-GN  gas 
phase  energy  difference  is  experimentally  unknown,  the  usual  practice  of  increasing  the  basis 
set  of  the  computations  until  one  obtains  what  is  considered  a  "correct"  result  can  not  be 
followed.  Even  so,  it  is  expected  that  smaller  basis  set  levels  (STO-3G,  DZ)  would  not  be 
particularly  satisfactory  and  one  might  begin  to  approach  something  reasonable  at  the  DZP 
level.  It  is  known  that  at  the  6-31G*  SCF  level  ion-molecule  interactions  of  small  systems 
accidently  mimic  larger  basis  set,  corrclationally  corrected  calculations  (ref  12).  At  the 
6-3 1 G  SCF  level,  the  computation  of  GZ  and  GN  with  a  number  (1-5)  of  molecules  of  water 
was  found  technically  possible  in  our  work  (ref  13).  At  the  6-3 1G*  SCF  level,  smaller  levels 
of  hydration  can  be  computed. 

With  regard  to  modeling  GZ  and  GN  in  the  presence  of  a  large  number  of  water 
molecules,  the  usual  (refs.  14,15)  but  not  exclusive  practice  (ref  10)  has  been  to  do  Monte 
Carlo  modeling.  Monte  Carlo  modeling  schemes,  however,  arc  based  on  pair- wise 
parameters  obtained  from  GZ  and  GN  single  water  interaction  potentials.  These  potentials 
arc  usually  not  tested  against  direct  ab  initio  calculations  for  small  water  cluster  systems  in 
which  ab  initio  calculations  arc  now  possible.  The  work  presented  here  and  to  be  published 
elsewhere  (ref  13)  will  lead  to  such  comparisons.  One  of  the  factors  discussed  here  is 
whether  earlier  calculations,  in  which  elaborate  optimizations  could  not  be  done,  obtained 
the  correct  structures  and  interaction  energies  for  watcr-GN  and  GZ  interactions. 

METHODOLOGY 

The  computations  reported  here  were  carried  out  with  the  program  M UNGAUSS  (ref 
16)  using  the  IBM  3090  at  CIRCE,  Orsay,  France.  All  computations  were  carried  out  at  the 
SCF  level  and  geometries  were  gradient  optimized  under  geometry  C,  restraint. 

RESULTS  AND  DISCUSSION 

The  relative  energies  of  the  GN  and  GZ  forms  arc  shown  in  Fig.  I.  In  the  deprotonated 
form,  G(-),  the  most  stable  conformcr  is  actually  an  unshown  staggered  configuration  which 
is  more  stable  by  13.4  keal/mol  than  the  eclipsed  form  shown  in  Fig.  1.  This  relatively  large 
conformational  energy  difference  probably  results  from  large  repulsions  between  the  COO(-) 
moiety  and  the  nitrogen  lone  pair  in  the  eclipsed  conformcr  in  comparison  with  the 
staggered  form.  In  this  latter  conformcr  both  II-II  repulsions  are  reduced  and  oxygen-NH 
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protons  arc  in  an  attractive  configuration.  The  energy  value  shown  in  Fig.  1  for  the  G(-) 
form  (373  kcai/mol)  represents  its  proton  affinity  (FA)  relative  to  GN.  The  unshown 
staggered  form  of  G(-)  has  a  PA  of  about  360  keal/mol  which  is  close  that  above  cited  value 
for  carboxylatc  anions  of  about  350  keal/mol.  However,  quantitative  accuracy  in  the  esti¬ 
mation  of  PAs  can  not  be  attained  at  this  basis  set  level,  especially  in  the  absence  of 
correlation  and  zero  point  energy  corrections. 

With  regard  to  various  GZ  conformcrs,  the  GZ  shown  in  Fig.  1  is  the  most  stable  and 
displays  an  unusually  short  single  hydrogen  bond  (1.547  angs.)  between  the  ammonium  and 
carboxylatc  mofctics.  This  was  also  found  in  previous  work  (ref.  7).  An  unshown  staggered 
conformcr  having  a  bifurcated  -HNHr-O-C  double  hydrogen  bond  is  only  3  keal/mol  less 
stable.  This  small  energy  difference  indicates  that  rotation  about  the  -NIIj(  +  )  bond  might 
be  easier  than  expected  (see  refs.  6,17),  especially  for  a  multihydratcd  species. 


Fig.  1  Schematic  representation  of  the  proton  transfer  coordinate  for  the  molecule  glycine 
in  its  various  forms.  The  energies  shown  were  obtained  at  the  SCF  6-3 1G*  level,  optimized 
in  C,  symmetry.  The  energy  of  the  the  most  stable  GN  (a)  form  is  -282.82596  au.  The  values 
of  the  dipole  moments  arc  shown  in  debyes. 

The  GZ  structure  displays  a  minor  minimum  on  the  hypcrsurfacc  at  this  basis  set  level. 
We  found  no  minimum  using  basis  sets  smaller  than  6-3 1G  (STO-3G,  3-2 1G,  4-3 1G) 
although  previous  work  (ref.  7)  indicated  one.  In  any  ease,  the  barrier  for  proton  transfer  is 
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computed  to  be  less  than  1  keal/mol  in  the  gas  phase.  Therefore,  the  existence  of  this 
species  in  the  gas  phase  is  dubious.  A  possible  verification  of  the  energy  difference  between 
the  GZ  and  GN  species  would  be  to  measure  the  gas  phase  activation  energy  for  deuterium 
scrambling  in  the  species  NDj-CHj-COOH.  However,  the  synthesis  of  this  molecule  in  an 
isotopically  pure  form  would  be  difficult. 

The  relative  stabilities  of  the  GN  conformcrs  are  shown  in  Fig.  I.  These  forms  are 
structurally  related  to  the  most  stable  GZ  form  having  an  intramolecular  hydrogen  bond. 
The  GN(a)  conformation  is  the  most  stable  having  a  slightly  shorter  intramolecular  II- 
bond,  2.014,  than  found  in  the  GN(b)  and  GN(c)  forms.  These  latter  two  structures  are  only 
2.9  and  2.4  keal/mol  less  stable  than  GN(a).  The  GN(a)  form  is  25  keal/mol  more  stable 
than  the  GZ  form.  This  energy  difference  has  been  estimated  previously  in  the  literature,  in 
some  eases  without  full  geometry  optimization.  For  instance,  at  the  optimized  6-3 1G  level 
this  value  is  22  keal/mol  whereas  a  previous  non-optimized  value  was  43  keal/mol  (ref.  8). 
On  the  other  hand,  the  optimized  4-3 1G  value  is  29  keal/mol  (ref.  7).  These  comparisons 
show  the  importance  of  geometry  optimization  in  estimating  the  GN-GZ  energy  difference, 
'fhc  effect  of  correlation  on  this  energy  difference  is  currently  under  study.  However,  based 
on  previous  experience  we  would  estimate  that  the  GZ-GN  energy  difference  will  be  in  the 
20  keal/mol  region  at  the  best  level  of  ab  initio  treatment. 

We  have  not  determined  the  barrier  in  rotation  about  the  C-OI I  bond  for  the  conversion 
of  GN(a)  to  GN(b)  nor  the  conversion  between  GN(b)  and  GN(c).  The  final  structure, 
GN(d)  is  the  least  stable  by  10.4  keal/mol.  This  form  docs  not  have  any  intramolecular 
011-0=  C  hydrogen  bonding  to  stabilize  its  conformation. 

With  regard  to  the  hydrated  forms  of  the  GN  and  GZ  space  docs  not  permit  a  discussion 
of  the  structures  of  complexes  containing  up  to  5  molecules  of  water.  The  two  most  stable 
structures  of  the  monohydratc  of  the  GZ  form  arc  CZI  and  CZ2  (Fig.  2).  These  structures 
have  nearly  the  same  energies.  The  first  structure,  CZI,  has  an  unusual  water  bridged 
configuration  between  the  ammonium  and  carboxylatc  moieties.  This  hydrated  structure 
still  maintains  the  intramolecular  hydrogen  bond  which  occurs  in  water  free  GZ.  This  double 
bridged  structure  is  unreported  in  previous  calculations  (c.g.  ref.  9)  in  which  a  bifurcated 
complexalion  of  water  with  the  carboxylatc  function  is  found.  However,  our  level  of 
theoretical  treatment  finds  that  latter  structure  is  only  1  kcal/molc  less  stable  than  the  CZI 
structure.  The  second  most  stable  structure,  CZ2,  has  only  a  single  bonded  water.  In  any 
ease,  these  structures  arc  of  comparable  stabilities  and  their  relative  energies  may  be  sensi¬ 
tive  to  the  basis  set  level  used  beyond  the  DZ1’  (c.g.  6-3  l  +  G*.  in  which  diffuse  functions 
arc  employed  which  would  better  describe  the  anionic  portion  of  GZ). 
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Fig.  2  The  SCF  level  6-31G*  structures  of  the  two  most  stable  forms  of  the  monohydrated 
GZ  and  GN  structures.  Only  the  hydrogen  bonded  structural  components  arc  shown. 

With  regard  to  the  GN  monohydrates,  the  two  most  stable  structures  were  also  determined 
(Fig.  2).  In  this  ease  the  most  stable  is  singly  hydrogen  bonded,  CNI,  which  is  structurally 
related  to  CZ2.  The  CN2  configuration  is  bifurcated  with  respect  to  the  complexation  of 
water  to  the  -C00I1  moiety,  lincrgctically,  the  most  important  feature  of  the  calculation 
is  that  the  GZ-GN  energy  difference  decreases  from  25  keal/mol  in  the  non-hydrated  form 
to  21  keal/mol  in  the  monohydratc.  Therefore,  at  the  monohydratc  level,  the  GN  form  is 
still  more  stable  than  the  GZ.  As  expected,  this  energy  difference  progressively  decreases 
with  the  number  of  water  molecules.  Our  preliminary  analysis  shows  that  with  4-5 
molecules  of  water  the  hydrated  GZ  and  GN  energies  become  nearly  the  same.  (ref.  13). 
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DOUBLE  PROTON  TRANSFER  STUDIES  IN  CARBOXYLIC  ACID  DIMERS 
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SUMMARY 

The  potential  energy  curves  for  the  double  proton  transfer  in  propionic,  butyric,  valeric, 
benzoic  and  o-chlorobenzoic  acids  have  been  calculated  by  the  modified  all-valence  INDO 
approach.  The  calculation  has  been  done  for  the  isolated  centrosymmetric  dimers  and  the 
same  systems  in  the  crystal  lattice.  The  role  of  substituents  in  the  potential  shape  is  discussed. 

INTRODUCTION 

Formic  and  acetic  acids  occur  in  the  crystalline  state  as  hydrogen  bonded  linear  polymers, 
whereas  higher  of  the  CHS  (CH2)n  COOH  homologs  exist  in  the  solid  as  centrosymmetric 
dimers  (refs.  1-5  ).  The  dimeric  systems  have  an  important  common  feature,  i.e.,  eight- 
membered  ring  consisting  of  the  two  O-H  •  •  •  O  hydrogen  bridges  supplemented  with  the  two 
additional  carbon  atoms.  The  systems  are  relatively  simple  and  can  be  considered  as  models 
for  the  double  proton  transfer  studies  in  other  more  structurally  complicated  and  biologically 
important  compounds. 

It  is  preferentially  assumed  that  in  the  gaseous  state  the  protons  can  displace  within  the 
hydrogen  bonds  between  the  two  symmetrical  potential  minima  of  equivalent  tautomers.  In 
a  condensed  medium,  however,  because  of  the  interacting  with  the  environment,  the  initial 
and  final  states  may  be  trapped  in  some  configuration.  Thus,  the  role  of  the  crystal  field  in 
the  mechanism  of  this  reaction  may  be  significant  but  still  not  clearly  understood.  In  such 
a  situation  a  comparison  of  the  potential  energy  curves  for  the  proton  displacement  in  the 
isolated  gaseous  dimers  and  those  in  the  crystal  might  be  interesting. 

THEORETICAL  OUTLINE 

The  charge  distribution  over  the  molecule  under  consideration  in  the  crystal  lattice  has 
been  evaluated  within  the  iterative  procedure.  First,  we  calculate  atomic  charges  for  the  free 
molecule  and  then  take  into  account  potential  created  by  the  crystal  lattice.  In  the  next  step 
the  new  charges  considering  previously  calculated  potentials  are  found.  This  procedure  is 
being  repeated  up  to  obtain  stable  charges  with  a  given  accuracy. 
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The  atomic  net  charges  have  been  evaluated  by  the  quantum  mechanical  modified  all¬ 
valence  INDO  method  (ref.  6  ).  However,  the  calculation  of  electrostatic  potentials  is  more 
complicated  because  of  the  summation  over  the  whole  crystal  lattice.  This  is  infinite  and 
slowly  convergent  series 

v<«='£*.  (i) 

& M  'J 

where 


V;m  is  the  electrostatic  potential  on  i-th  atom  of  the  M-th  molecule 

qj  denotes  the  j-th  atom  charge 

r,-j  is  the  respective  interatomic  distance. 


The  summation  in  Eq.  (1)  may  be  partitioned  into  the  following  three  steps 
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index  j  denotes  the  summation  over  one  molecule 

index  m  denotes  the  summation  over  all  molecules  in  unit  cell 

index  k  denotes  the  summation  over  all  cells  in  lattice. 


Changing  the  summation  order  in  Eq.  (2a  ),  we  obtain 
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The  inconvenient  infinite  sum  is  independent  of  atomic  charges  and  it  can  be  calculated 
by  using  the  Evald  method  (refs.  7-8  ).  In  order  to  evaluate  the  electrostatic  potentials  in 
the  above  described  iterative  procedure,  we  have  to  do  only  the  following  summation 


molecule 

ViM=  £  QjMij  (2c) 

j'6m 

mjLM 


RESULTS  AND  DISCUSSION 

The  results  of  our  calculation  (Table  1)  show  that  the  barrier  height  is  strongly  dependent 
on  the  substituents.  Mainly,  benzene  and  o-chlorobenzene  rings  decrease  the  barrier  leading 
to  the  symmetrical  potential  in  the  former  case.  No  symmetrical  potential  for  the  double 
proton  displacement  was  found  for  other  systems  under  consideration.  On  the  other  hand, 
the  aliphatic  chain  length  seems  do  not  influence  the  barrier.  These  conclusions  are  based 
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on  the  assumption  that  the  hydrogen  bond  length  is  almost  the  same  in  all  systems  under  f 

i 

consideration. 


Table  1:  Molecular  parameters  and  barrier  height  values  [eV]  for  carboxylic 
acid  dimers. 


ACID 

PROPIONIC 

BUTYRIC 

VALERIC 

BENZOIC 

O-CHLORO 

-BENZOIC 

DIMER 

Ref.  1 

Ref.  2 

Ref.  3 

Ref.  4 

Ref.  5 

H-B  LENGTH 

2.64 

2.62 

2.64 

2.64 

2.63 

DIFFERENCE* 

0.09 

0.12 

0.09 

0.01 

0.09 

GASEOUS 

1.42 

1.10 

1.30 

0.28 

0.83 

CRYSTAL 

1.42 

0.95 

1.35 

0.32 

0.79 

O 

*  Difference  [A]  between  C  -  O,  and  C  =  0  bond  lengths 


The  potential  energy  curve  for  the  benzoic  acid  dimer  seems  to  be  nearly  symmetric 
and  the  lowest  of  the  all  calculated  systems.  This  result  is  in  agreement  with  the  latest 
Hochstrasser  and  Trommsdorff  experimental  results  (ref.  9  )  showing  that  the  acid  protons 
are  delocalized  within  the  approximately  symmetric  double-well  potential  (Fig.  1). 


! 

i 


I 


I 


s 

I 


Fig.  1:  Potential  energy  curves  for  the  double  proton  transfer  of  the  benzoic 
acid  and  o-chloro-benzoic  acid  dimer. 


In  fact,  it  was  the  first  observation  of  the  protons’  delocalization  for  carboxylic  acid  dimers  in 
the  condensed  phase  which  seems  to  be  in  a  good  correlation  with  the  difference  in  the  0-0  - 
and  0=0  bond  lengths  (Table  1).  The  potential  energy  maps  for  the  double  proton  transfer 
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in  benzoic  gaseous  and  crystal  Fig.  2  acid  dimer  shows  that  the  two  hydrogen  bond  protons  ; 
do  not  move  independently.  The  correlated  double  proton  transfer  has  also  been  confirmed 
by  calculations!  results  in  our  previous  studies  (ref.  10  ). 


Fig.  2:  Potential  energy  [eV]  map  for  the  double  proton  transfer  in  the 
crystalline  benzoic  acid. 

The  potential  energy  curve  for  the  torsional  rotation  of  the  eight-membered  ring  suggested 
by  Furic  (ref.  11  )  has  been  calculated  for  benzoic  and  o-chloro-benzoic  acids  (Fig.  3). 
According  to  the  Grabowski  and  Krygowski  (ref.  12  )  calculations,  this  kind  of  tautomeric 
transformation  in  the  crystal  lattice  seems  to  be  possible.  In  our  case,  however,  as  the 
potential  is  positive  for  some  rotation  angles,  the  Furic  mechanism  of  tautomerization  should 
be  excluded. 

It  should  be  noted  that  the  method  of  our  calculations  may  be  much  to  simplified  for 
quantitative  considerations,  however,  it  seems  to  be  conclusive  for  comparative  discussion. 
Furthermore,  the  molecular  geometry  optimization  may  influence  the  barrier  height  to  a  great 
extent.  Anyway,  our  experience  (ref.  13  )  says  that  the  results  of  our  calculations  may  be 
useful  for  qualitative  considerations  including  experimental  investigations  of  these  important 
tautomerization  reactions. 


i 


•f 


®-33«,36 


e 

iS-33«.«i 


_ benzole  o;ld 

*  Rotation  In  gas 
D  Rotation  In  crystal 


o-ehloro-banzota  acid 


18.0  39.0  64.0  78.0  99.0  100.0  188.0  144.0  188.0  180.0 
Rotation  angle  [degree} 


18  »  64  72  gQ  106  120  144  102  180 

Rotation  angle  [degree] 


Fig.  3:  Potential  energy  curves  for  hindered  rotation  of  the  eight-membered 
ring  in  the  gaseous  and  crystalline  benzoic  acid  and  o-chloro-benzoic  acid 
dimer. 


In  general,  our  calculations  strictly  point  out  that  the  crystal  lattice  effects  seem  to  lower 
the  potential  barrier  for  linear  hydrogen  bonded  systems  (ref.  14  ),  whereas  no  essential 
changes  were  found  for  carboxylic  acid  dimers.  Then,  in  agreement  with  Nakamura  and 
Hayashi  (ref.  15  )  these  effects  reduce  the  barrier  height  by  ca.  15%  and  seem  to  be  unable 
to  explain  experimental  results  of  the  low  potential  energy  barrier  of  the  1  kcal/mol  for  the 
crystalline  benzoic  dimer  (ref.  16  )  whereas  the  calculated  activation  energy  was  estimated 
to  be  5  kcal/mol  (ref.  17  )  or  more. 
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DISCUSSION 


PERAHIA  -  Do  you  think  the  role  of  conformational  relaxation  may  be  important  to 
lower  down  the  barrier  of  double  proton  tranfer  associated  with  the  8  membered  ring 
rotation  ? 

CHOJNACKI  -  I  am  convinced  that  the  role  of  relaxation  is  important  to  change  the 
potential  energy  curve  for  the  proton  transfer  reactions.  However,  in  our  comparative 
studies,  we  have  performed  the  scaling  for  the  total  energy,  and  the  results  should  be 
still  conclusive. 


ANGYAN  -  I.The  self-consistent  Madelung  Potential  approach  you  use  should 
be  adapted  very  carefully  with  respect  to  the  energy  expression.  In  effect,  the 
perturbation  in  the  Hamiltonian  due  to  the  crystal  field  is  : 

V (r)  =  Jp(r)  G(r,0  <  y  I  p  (r*)  I  y  >  dr* 

is  of  non-linear  nature,  due  to  the  appearance  of  the  expectation  value.  This  non-linear 
character  is  reflected  by  the  fact  that  the  energy  expression  (energy  per  molecule  in 
the  crystal)  is : 


E  =  <  y  |  H°  +  1/2  j  dr  V(r)  p(r)  i  vj/  > 

which  is  different  from  your  energy  expression  by  a  factor  of  1/2  before  the  interaction 
term  (see  J.G.  Angyan  &  B.  Silvi,  J.  Chem.  Phys.  SfL_  (1987)  69571. 

Although  your  wavefunction  should  be  correct  the  energies  should  be  corrected  by  the 
above-mentioned  factor.  This  may  affect  the  results,  and  the  conclusions  concerning 
the  negligible  effect  of  the  crystal  field  may  be  revised. 

2.  Your  model  corresponds  to  a  simultaneous  proton  transfer  in  the 
crystal.  Physically  it  is  perhaps  more  reasonable  to  consider  such  a  process  as  a 
"defect"  formation,  where  only  one  pair  of  H-bonds  is  reorganizing  and  creates  an  ionic 
defect.  A  similar  model  was  treated  by  us  recently  on  the  proton  conduction  in  oxonium 
perchlorate.  Can  you  comment  on  the  difference  between  the  "simultaneous"  and 
"print-defect"  proton  transfer  models. 
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CHOJNACKI  -  1 .  It  may  be  not  clearly  seen  from  the  text,  but  in  fact  we  have  taken 
into  account  the  non-linear  factor  of  1/2  in  our  computer  program  for  the  energy 
evaluation. 

2.  Our  calculations  on  the  defect  mechanism  are  in  progress. 
However,  it  seems  now  that  the  proton  transfer  reaction  is  the  correlated  process. 
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MODELLING  OF  THE  DISULFIDE  BRIDGE  IN  PROTEINS  : 

AB  INITIO- Cl  STUDIES  OF  S2H2  AND  S2(CH3)2 

Michel  LOOS 

Laboratoire  de  Chimie  Theorique,  U.A.  510  CNRS,  Universite  de  Nancy-1,  BP  239,  F-54506 
Vandoeuvre-les-Nancy,Cedex,  France. 


SUMMARY 

We  report  here  the  potential  surfaces  around  the  S — S  bound  for  S2H2  and  for  S2(CH3)2, 
using  Ah  initio-  C I  calculations  with  a  4-31G  basis  set. 


INTRODUCTION 


The  conformation  of  the  disulfide  bond  in  proteins  plays  a  key  role  in  their  biological  properties[l]. 
Much  theoretical  work  has  been  done  concerning  the  rotational  barriers  about  the  S-S  bond[2,3]. 
We  calculated  the  potential  surface  in  the  fundamental  state  and  in  the  lowest  excited  state  in 
order  to  evaluate  the  possibility  of  radiation  dammage  to  the  proteins. 


We  used  two  models  for  the  disulfide  bond  :  the  S2H2  (Cf.  Fig  la)  and  S2(CH3)2  (Cf  Fig  lb) 
systems  and  we  report  here  the  torsional  potential  surfaces  around  the  S-S  bond  for  both  of  them 
in  the  fundamental  and  lowest  excited  state. 


METHOD 

Ab  initio  SCF  calculations  have  been  carried  out  using  the  HONDO  program  [4]  at  the  4-31G* 
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basis  set  level.  The  coefficients  of  the  d-type  orbitals  which  were  optimized  on  H2S  are  given  in  f* 
table  I. 


Expt. 

Coe?. 

d  functions 

0.818916 

6.357851 

_  . 

0.262078 

0.759561 

Table  I 


All  geometries  were  gradient  optimized  at  the  SCF  level  with  one  frozen,  parameter  (8)  and 
the  optimization  was  stopped  when  the  gradients  in  internal  coordinates  were  less  than  5.10“ 4. 

Cl  calculations  including  up  to  the  third  level  of  excitation  were  then  carried  out  using  the 
same  program  both  for  the  fundamental  state  and  the  lowest  excited  state. 

Every  potential  curve  was  fitted  with  a  function  of  the  type  : 

A  -1-  B  *  cos(0)  +  C  *  cos(2  *6)  +  D  *  cos(3  *  6)  (1) 


used  in  most  molecular  mechanics  programs. 


RESULTS 

These  calculations  give  us  the  rotational  barrier  of  both  coumponds  :  respectivly  AE\  =  11.5 
kcal/mol  and  AE2  =  5.7  kcal/mol  for  the  eclipsed  and  planar  forms  of  S2(CH3)2  and  AE\  =  9.4 
kcal/mol  and  AE2  =  7.6  kcal/mol  for  Sjl^.  These  values  are  the  same  with  or  without  Cl  for  S2H2 
where  the  correlation  is  very  weak  but  for  S3(CH3)2  the  differences  are  important.  Consedering 
the  Cl  we  get  AEi  —  9-4  kcal/mol  and  AE2  =  6.0  kcal/mol.  The  reduction  of  the  rotational 
barreer  of  the  eclipsed  form  is  due  to  the  stabilization  of  some  hyperconjugated  forms  in  that 
conformation.  The  differences  between  our  two  models  are  now  rather  small. 


\ 

1 

E(HF) 

E(Ci) 

E(d)  excited 

i 

S2H2  (cis) 

-795.3661817 

-795.6327498 

-795.2627430 

1 

S2H2  (90°) 

-795.3811301 

-795.6473034 

-795.2762887 

1 

S2H2  (trans) 

-795.3689627 

-795.6354249 

-795.3147800 

S2(CH3)2  (cis) 

-873.3216696 

-873.6415101 

-873.2168834 

i 

S2(CH3)2  (90°) 

-873.3399725 

-873.6565262 

-873.2274197 

1 

S2(CH3)2  (trans) 

-873.3308588 

-873.6470410 

-873.2804258 

Table  II  :  Absolute  energies  (in  Hartrees)  of  the  critical  points. 
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The  lowest  excited  state  corresponds  to  a  transition  of  the  type  rxs-s  —*  tt s-s ■ 

We  report  in  Fig.  2  the  potential  curve  for  the  compound  S2H2  both  for  the  fundamental 
state  and  the  lowest  excited  state.  One  can  notice  the  modification  of  the  minima  and  maxima 
positions  between  the  fundamental  state  and  the  excited  state.  Furthermore  the  minimum  of  the 
excited  state  potential  corresponds  to  a  maximum  of  the  fundamental  state  potential. 


Figure  2  :  Potential  surface  of  S2H2 
— □ —  :  fundamental  state 
••'O'-'  :  excited  state 


Figure  3  :  Potential  surface  of  S2(CH3)2 
— □ —  :  fundamental  state  HF 
■■'O’-'  :  fundamental  state  Cl 
-  -  -  A  -  -  -  :  excited  state 


Figure  4  :  Variation  of  the  S-S  distance  with  the 
torsionnal  angle  6 
:  S2H2 

••  A  --:  S2(CH3)2 

The  same  observations  can  be  done  in  figure  4  concerning  S2(CH3)2. 
The  coefficients  of  equation  (1)  are  given  in  table  III. 
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'A' 

B 

C 

D 

S2H2  -  HF 

4.19 

0.65 

4.25 

0.21 

S2H2  -  Cl 

4.19 

0.65 

4.25 

0.21 

S2(CH3)2  -  HF 

4.22 

1.92 

4.29 

0.97 

S2(CH3)2  -  Cl 

3.85 

0.70 

3.85 

1.02 

Table  III  :  Coefficients  of  the  fitted  functions. 

We  also  report  in  Fig.  3  the  variation  of  the  distance  RS-s  with  the  torsional  angle  0  in 
S2H2  and  S2(CH3)2.  This  distance  is  strongly  dependent  on  the  torsional  angle  0.  Spectroscopic 
techniques  like  EXAFS  which  are  useful  to  determine  precisely  specific  distances  in  proteins  [5] 
can  then  give  us  informations  about  that  angle. 

The  change  in  the  topology  of  the  potential  curves  of  the  disulfide  bonds  between  the  fun¬ 
damental  and  first  excited  state  can  induce  a  change  in  the  absolute  conformation  around  these 
bonds  under  radiation.  Furthermore,  knowing  that  these  coumpounds  exhibit  a  strong  natural  cir¬ 
cular  dichro'ism  [6],  one  can  hope  to  increase  the  relative  concentration  of  one  of  the  enantiomers 
under  irradiation  with  circularly  polarized  light. 

CONCLUSION 

The  topological  change  in  the  torsional  potential  between  the  fundamental  state  and  the  lowest 
excited  state  in  disulfides  can  lead  to  an  inversion  of  the  absolute  configuration  of  these  disulfides 
under  radiation.  This  inversion  can  be  used  in  order  to  modify  the  relative  concentration  of  the 
two  enantiomers. 
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SUMMARY 

The  far  infrared  spectra  of  alkali  halide  salts  in  solvents  of  high  dielectric 
constant  (CH3OH,  CH3CN)  are  investigated  in  the  frequency  range  20-600  cm 
These  spectra  are  characterized  by  a  complex  band  shape  exhibiting  several 
absorption  peaks.  By  using  a  chemical  model  of  the  electrolytic  solution  and  with 
the  help  of  the  Mori  theory,  it  is  possible  to  reproduce  the  band  shape  and  to 
identify  the  molecular  motions  at  the  origin  of  the  absorption  peaks. 


INTRODUCTION 

After  one  century  of  investigations  on  electrolytic  solutions,  the 

quantitative  description  of  all  the  microscopic  processes  involved  in  the  ionic 

solvation  is  not  yet  clearly  established.  The  reason  tor  that  mainly  lies  in  the 

large  variety  of  molecular  entities  that  constitute  the  solution  (free  ions,  ion 

pairs  of  different  structure,  higher  order  ionic  aggregates),  each  being 

characterized  by  its  own  state  of  solvation.  In  the  last  two  decades  many 

informations  concerning  the  static  and  dynamic  aspects  of  ionic  solvation  have 

been  obtained  with  the  h*lp  of  conductivity  and  permittivity  measurements,  optic 

and  magnetic  spectroscopies  and,  more  recently,  by  using  X-ray  and  neutron 

scattering  :  for  reviews  see  (refs.  1-3).  From  a  theoretical  standpoint  two  major 

advances  have  greatly  enlarged  our  understanding  of  ionic  solutions  :  (i)  the 

emergence  of  a  statistical  theory  of  ionic  solutions  at  equilibrium  based  on  the 

correlation  function  formalism,  (ii)  the  extensive  use  of  computer  simulations 

(MC  and  MD) .  For  a  report  on  the  state  of  the  art  see  ref.  4. 

The  present  paper  is  concerned  with  the  dynamics  of  the  ionic  solvation  as 

revealed  by  far  infrared  (FIR)  spectroscopy.  The  microscopic  processes  which  take 

-2 

place  in  an  electrolytic  solution,  cover  a  large  domain  of  time,  from  10  s  to 

-14 

10  s.  Dielectric  relaxation  and  conductivity  measurements  probe  the  time  scale 
-10 

up  to  10  s  and  give  informations  on  the  collective  reorientation  of  ion  pairs 
and  ionic  aggregates  in  the  solution  as  well  as  on  the  translational  diffusion  of 
the  ionic  species  (ref.  5).  As  far  as  solutions  of  low  conductivity  are 
concerned,  the  permittivity  measurements  lead  to  a  direct  access  on  the  structure 
and  long  time  dynamics  of  ion  pairs  and  aggregates.  On  the  contrary,  when 
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solutions  of  high  conductivity  are  investigated,  the  interpretation  of  the 
measurements  is  greatly  complicated  by  the  presence  of  a  large  number  of  free 
ions  in  the  solution.  However,  at  much  higher  frequencies,  i.e.  in  the  far 

-12  -14 

infrared  range  (10  s  -  10  s),  the  conductivity  of  the  solution  does  not 
affect  the  absorption  coefficient  and  the  infrared  band  is  then  a  very  sensitive 
probe  of  the  molecular  dynamics  in  this  time  scale.  This  fact  was  recognized  in 
the  late  sixties  and  seventies  when  several  authors  (refs.  6-13)  discovered  that 
the  far  infrared  band  of  alkali  metal  salts  in  various  solvents  (DMSO,  THF, 
2-pyrrolidone,  sulfolane,  acetone,  etc..)  could  be  attributed  to  the  vibration  of 
the  alkali  ion  in  its  first  solvation  shell.  Nevertheless,  the  assignement  of  a 
FIR  band  is  not  obvious  and  strongly  depends  on  the  solvent.  For  example,  Evans 
and  Lo  (ref.  14)  found  that,  for  tetraalkylammonium  salts  in  benzene,  the  FIR 
band  could  be  attributed  to  a  cation-anion  vibration.  On  the  other  hand,  the  FIR 
spectra  of  the  alkali  metal  salts  in  solvents  of  higher  dielectric  constant  were 
interpreted  as  due  to  the  vibration  of  the  cation  in  a  cage  formed  wholly  or 
partly,  by  the  solvent  molecules  ;  in  the  latter  case  the  anion  participates  to 
the  rattling  motion.  However,  these  early  studies  suffer  of  two  drawbacks.  First, 
the  assignement  of  the  FIR  band  is  only  very  qualitative  since  the  interpretation 
is  not  based  on  a  rigorous  theoretical  treatment  of  the  band  shape  as  it  exists 
for  Raman  and  infrared  bands  in  neat  liquids.  Next,  due  to  the  intrinsic 
difficulties  to  record  such  spectra,  the  band  shapes  are  poorly  resolved  (if 
published)  and  many  spectral  features  could  be  left  out. 

Our  purpose  is  to  reexamine  the  FIR  spectra  of  alkali  halide  salts  in 
different  solvents  of  high  dielectric  constant,  namely,  a  protic  solvent, 
methanol,  and  a  dipolar  aprotic  solvent,  acetonitrile.  The  experimental  band 
shapes  are  interpretated  in  the  framework  of  a  time  dependent  correlation 
function  formalism.  The  correlation  functions  associated  with  the  electrolytic 
solution  are  described  by  a  generalized  Langevin  theory  and  the  parameters  of  the 
theory  are  estimated  with  the  help  of  recent  computer  simulation  data  on  ionic 
solutions.  A  quantitative  description  of  ionic  motions  taking  place  into  the 
solution  is  given  and  the  ionic  entities  giving  rise  to  specific  frequency  modes 
are  identified. 

EXPERIMENTAL  METHOD 

Absorption  spectra  were  recorded  with  a  Michelson  interferometer,  CODERG  FS 
2000,  between  20  and  600  cm  *.  Since,  in  this  frequency  range,  dipolar  solvents 
such  as  methanol  and  acetonitrile  exhibit  rather  strong  absorption  bands,  the 
absorption  pathlength  through  the  sample  was  limited  to  about  50p.  The  optical 
cell  was  thus  made  of  two  polyethylene  windows  and  of  a  flat  ring  of  mylar 
pressed  between  them.  This  cell  was  easily  filled  without  bubles  by  running  the 


liquid  through  two  capillary  plastic  tubes,  inserted  in  one  of  the  windows.  The 
exact  sample  thickness  was  determined,  a  posteriori,  thanks  to  the  interference 
fringes  appearing  in  the  empty  cell. 

All  products,  salts  (til,  Rbl)  as  well  as  solvents  (CH^OH,  CH3CN)  were  used  as 
delivered  by  Aldrich  chemie,  without  further  purification.  In  order  to  extract, 
from  the  total  absorption  spectrum  of  the  solution,  that  part  due  to  the 
salt-solvent  interaction  the  spectrum  of  each  solution  was  refered  to  that  of  the 
pure  solvent,  i.e.  plotting  the  quantity  In  I  /I,  where  IQ  and  I  are  the 
transmitted  intensities  through  the  pure  solvent  and  through  the  solution, 
respectively.  However,  at  the  relatively  high  concentrations  of  salt  considered 
here  (0.5  -  1.5  mol/kg)  the  number  density  of  solvent  molecules  in  the  solution 
is  sensibly  lower  than  its  value  in  the  pure  solvent.  This  fact  is  clearly 
illustrated  on  our  absorption  spectra  by  the  existence  of  some  small  negative 
regions  (see  Fig.l).  A  correction  of  these  spectra  is  now  in  progress  and  will 
appear  in  a  further  and  more  extended  publication.  Nevertheless  it  must  be 
emphasized  that  the  general  spectral  features  examined  here  will  remain 
practically  unchanged. 


THEORY  OF  THE  FIR  SPECTRUM  OF  ELECTROLYTIC  SOLUTIONS 

Until  very  recently  the  relationship  between  the  electric  permittivity  of  a 
polar  liquid  and  the  molecular  level  was  the  subject  of  a  continuous  debate.  As  a 
matter  of  fact,  the  frequency  dependent  dielectric  constant  is  related  to  the 
correlation  function  of  the  microscopic  dipole  moments  of  the  sample  through  an 
equation  which  depends  on  the  geometry  of  the  sample.  This  is  a  well  known  effect 
of  the  long  range  dipole-dipole  interactions.  This  point  has  an  important 
consequence  when  evaluating  by  computer  simulation  the  dielectric  constant  of  a 
polar  medium.  The  shape  of  the  basic  cell  and  the  periodic  boundary  conditions 
require  to  implement  the  relevant  expression  for  the  dielectric  constant  :  if 
not,  spurious  results  are  obtained.  In  the  case  of  an  electrolytic  solution,  the 
problem  is  complicated  by  the  electric  conductivity  of  the  medium.  However,  by 
using  the  linear  response  theory,  it  is  possible  to  derive  a  simple  expression 
for  the  permittivity,  e(u>),  associated  with  a  spherical  liquid  sample  of  volume  V 
(this  system  is  chosen  by  convenience)  embedded  in  a  perfectly  conducting  medium 
(see  refs.  5,15), 


€(o)  -  e  = 


<H  >  +  ioi  +  <H. 


where  M  takes  into  account  all  microscopic  dipole  moments  (permanent  and  induced) 
existing  in  the  solution,  J  is  the  ionic  current  generated  by  the  fraction  of 
dissociated  salt,  <  >w  is  the  Laplace  transform  of  the  corresponding 
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correlation  function  and  e  is  the  value  of  the  dielectric  constant  in  the  high 

CO 

* 

frequency  limit  .  In  a  far  infrared  experiment  one  measures  the  absorption 

coefficient,  a(o>)  =  a>  Im.e(a>)/nc,  where  n  is  the  refractive  index.  In  defining 

Mion  =  Z  qjRj  (with  J  =  Mion) ,  and  after  some  algebra,  the  absorption  coefficient 

*  * 

of  an  electrolytic  solution  is  given  by, 

2n  r+oD 

«(cd)  -  -  tu  dt  e-lut  <(M( t )  +  Hion(t))  .  H(0)>  (2) 

3nckTV  J  -co 

It  is  worthwhile  to  notice  that  in  the  case  of  a  non  dissociating  solvent  (e.g. 
benzene),  Mion  =  0,  the  expression  (2)  becomes  identical  to  that  of  a  neat 
liquid.  Nevertheless  to  go  further  in  a  formal  analysis,  it  is  necessary  to 
define  what  we  call  a  chemical  model  of  the  electrolytic  solution.  It  is  well 
established  (ref. 16)  that  in  an  electrolytic  solution,  free  ions,  a  variety  of 
ion  pairs  (at  contact  or  solvent  separated),  triple  ions  and  higher  ionic 
aggregates  coexist  as  distinct  species,  each  endowed  with  its  own  physical 

properties.  In  highly  dissociating  solvents  as  methanol  and  acetonitrile,  an 

appreciable  amount  of  salt  is  dissociated.  Then  we  postulate  that  the  ions  are 

either  free  ions,  or  associated  in  pairs  ;  the  concentration  of  each  species 

being  governed  by  the  dissociation  constant.  For  ion  pairs  we  will  assume  that 
they  can  exist  either  as  contact  ion  pairs  (CIP),  tightly  bound,  or  as  solvent 
separated  ion  pairs  (SSIP),  loosely  bound.  Moreover,  we  will  neglect  higher  ionic 
aggregates.  Thus  in  the  framework  of  this  model  it  is  now  possible  to  specify 
more  precisely  all  molecular  mechanisms  which  contribute  to  the  absorption 
coefficient . 

In  fact,  even  with  the  small  number  of  well  defined  species  (free  ions,  ion 
pairs,  solvent  molecules),  the  evaluation  of  eqn.(2)  is  hardly  tractable  in  its 
full  complexity.  Hence  we  choose  to  make  the  following  approximations  ; 

a)  the  solvent  molecules  are  polarized  only  by  the  coulombic  charges  of  the  ions, 

b)  the  polarizability  a_  of  the  solvent  molecules  is  assumed  isotropic, 

s 

c)  the  ions  are  not  polarized  by  the  solvent  molecules. 

Then,  the  total  dipole  moment  due  to  all  neutral  species  is  given  by, 

Ns 

M  =  Ms  +  Magg  +  "pol  :  wlth  Hs  *  =  •*»  (3a'b> 


* 

In  practice,  this  value  is  taken  in  the  IR  frequency  range  corresponding  to 
the  absorption  of  light  by  the  intramolecular  vibration  modes  of  the  molecules. 
•  * 

A  rigorous  expression  taking  into  account  the  detailed  balance  principle  will 
make  appear  in  eqn.(2),  (Dtanh(l)h<D/2)  instead  of  id  ,  and  a  symmetrized  CF. 
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Ny.Ns 

“s  E  Kiyj  (3c, d) 
y=+ ,  - 

iy.j 

where  Pj  is  the  permanent  dipole  of  the  solvent  molecule  i,  »iC1^  and  ■^sslP 

are  the  dipole  moment  of  the  contact  ion  pair  i  and  the  solvent  separated  ion 

pair  j,  respectively,  and  as  E^j  is  the  dipole  moment  induced  on  the  solvent 

molecule  j  by  the  electric  field  emanating  from  the  ion  iy .  With  the  definitions 

(3a-d)  and  that  of  given  previously,  it  is  easy  to  write  formally  the 

microscopic  expression  of  a(u>)  ;  but  to  be  useful  some  additional  simplifications 

are  necessary.  It  is  impossible  in  this  brief  report  to  make  a  detailed  analysis 

of  the  theory  beyona  this  step  ;  that  will  be  the  purpose  of  a  next  extended 

paper  (ref. 17).  Nevertheless,  the  following  comments  merit  attention.  In  highly 

dissociating  solvents  and  at  low  concentration  of  salt,  the  free  ions  and  the  ion 

pairs  have  only  solvent  molecules  in  their  first  solvation  shell.  Since  the 

electrostatic  interactions  are  strongly  screened  by  the  polar  solvent  beyond  the 

first  solvation  shell,  one  can  reasonably  assume  that  the  molecular  motions  of 

the  ions  are  decoupled  from  each  other.  Moreover,  the  ions  being  strongly 

solvated,  their  motions  are  characterized  by  an  oscillation  in  the  cage  of  the 

-14  -13 

first  neighbours,  at  short  times  (~10  -10  s),  and  by  a  slow  drift  at  longer 

-12 

time  (-10  s) .  All  these  features  can  be  incorporated  into  the  theory.  This 
leads  to  the  following  approximate  formula  for  G(t),  the  time  CF  figuring  in 
eqn. (2) , 


H  =  ECip«  ciP 

399  i=l  1 


ssip  ssip 
*■  "i 
i=l 


^’pol 


G(t)  ■  <  Mpol(t>Hpol(0)  >  +  <  Magg(t)-Magg(0)  >  +  <  Ms(t)Ms(0)  >  (4) 

Ny.Nsr  uxltl.u) 

<Mpol(t)Mpol(0)>=(ase)  E  H - — 

y=+,-L  r  ^(tjr  ^ 


(5) 


iy,3 


•v°> 


3  (Uj  yr .  ( t )  .uxlO))  -1 

+<Qy(t)  .Qy(0)>< - - - - - > 

r  jTTj ( t ) r  jijtO) 


] 


2  Ncip  cip  cip  Nssip  ssip  ssip 

<Magg(t,-Magg(0)>  3  e  E  <  Qt (t ) ,Q1  (0)  >  +  e  E  <  Qt (t) .Qj (0)  >  (6) 

i=l  i=l 


Ns 


<Ms(t).Ms(0»  -  E  <  ^(t).^(0)  >solutio„ 

».3 


(7) 


where  r^  is  the  distance  between  the  ion  iy  and  the  solvent  molecule  j,  Qy  is 
the  caging  vibration  mode  of  the  ion  iy,  QjC1^  (or  ss i p>  jg  t^e  distance 
separating  the  centers  of  charge  of  the  contact  ion  pair  i  (or  SSIP),  and  Pj  is 
the  permanent  dipole  moment  of  the  solvent  molecule  i.  In  practice,  what  is 
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measured  is  the  absorption  difference  between  the  solution  and  an  equivalent 
amount  of  pure  solvent.  Thus  we  must  substract  from  G(t)  the  CF  <Ms(t) .Ms(0)>o 
for  the  neat  solvent.  This  makes  to  appear  a  time  correlation  function  of 
structural  relaxation,  ^relax**^'  w^^ch  expresses  the  reorganization  of  the 
solvent  molecules  located  just  around  the  ionic  species.  As  a  matter  of  fact, 
beyond  the  first  solvation  shell  surrounding  the  ionic  species,  the  solvent 
molecules  behave  like  in  the  bulk  solvent  and  do  not  contribute  to  Grelax(t). 

In  summary,  the  present  theoretical  analysis  predicts  that  the  FIR  absorption 
spectrum  of  an  electrolytic  solution  is  mainly  governed  by  three  mechanisms  :  (i) 
an  induction  mechanism  modulated  by  the  oscillatory  motion  of  the  solvated  ions, 
(ii)  a  dipolar  absorption  mechanism  due  to  the  presence  of  ion  pairs  (CIP  and 
SSIP)  which  vibrate  and  librate  like  diatomic  molecules,  (iii)  a  structural 
relaxation  mechanism  which  expresses  the  reorganization  of  the  solvent  molecules 
during  the  solvation  process. 

In  order  to  make  a  quantitative  comparison  with  the  experimental  data,  we  have 

developed  a  band  shape  analysis  based  on  the  generalized  Langevin  equation 

According  to  the  Mori  formalism  (ref. 8),  the  vibrational  variables  Q„,  Qcip,  and 

0 

qSsip  are  conSi<iered  as  Brownian  harmonic  oscillators  (anharmonicity  is  also 
taken  into  account).  The  oscillatory  motion  is  characterized  by  a  frequency  mode, 
a) y  =  'I k^/m^, ,  where  ky  is  the  force  constant  and  my  the  co:  responding  reduced 
mass,  and  by  a  damping  funtion  describing  the  translational  diffusion  of  the 
species.  In  the  case  of  solvated  ions,  k  is  deduced  numerically  from  the 
computer  simulation  data  of  Heinzinger  et  al.  (ref. 19)  on  alkali  halide  salts  in 
aqueous  solutions.  In  the  case  of  ion  pairs,  Karim  and  Me  Cammon  (ref. 20)  have 
investigated  the  hydration  of  a  sodium  chloride  ion  pair.  By  performing  a  Monte 
Carlo  simulation,  these  authors  have  evaluated  the  solvent-averaged  potential  of 
mean  force  acting  on  each  partner  of  the  pair.  Although  these  computer 
simulations  only  deal  with  aqueous  solutions,  the  close  similarity  between  water 
and  methanol,  as  far  as  the  solvation  process  is  concerned,  enable  us  to  assume 
that  the  force  constants  ky,kcjp,  and  kssip  deduced  from  the  aforementioned  study 
give  the  correct  order  of  magnitude  for  the  alcoholic  solution.  For  dipolar 
aprotic  solvents  (e.g.  acetonitrile),  the  molecular  dynamics  simulation  of 
Ciccotti  et  al.  (ref. 21),  which  evaluates  the  potential  of  mean  force  of  an  ion 
pair  immersed  in  a  solvent  modelled  by  polar  diatomic  molecules,  merits 
attention.  This  model  system  is  loosely  connected  with  real  solutions  and  our  use 
of  the  force  constants,  *cip  and  *ssjp>  obtained  by  these  authors  must  be 
understood  as  an  attempt  to  test  our  theory.  Finally,  the  determination  of  the 
absorption  spectrum  also  requires  the  evaluation  of  several  translational  and 
rotational  correlation  functions  (see  eqns.  5-7).  These  latter  ones  can  also  be 
evaluated  in  the  framework  of  the  Mori  formalism.  The  main  effect  of  these 


translational  and  rotational  contributions  is  to  broaden  and  to  distort  somewhat 
the  band  shape  generated  by  the  oscillatory  modes. 

DISCUSSION 

The  solvation  bands  corresponding  to  LiI/CH3OH,  Lil/CH^CN,  LiCl/CH3OH  and 
Rbl/CHjOH,  respectively,  are  presented  in  Fig.l.  The  absorption  profile  of 
lithium  salts  in  methanol  and  acetonitrile  solutions  are  characterized  by  three 
bands  :  a  strong  high  frequency  band  that  reaches  the  mid  infrared  range  and  two 
weak  bands  on  the  low  frequency  part  of  the  spectrum.  The  high  frequency  peak 
(460  cm  1  in  CH3OH  and  408  cm  1  in  CH3CN)  is  quite  similar  to  the  one  observed  by 
Maxey  and  Popov  (refs. 7,8)  in  their  study  on  lithium  salts  in  DMSO  (where  ui  „  = 
429  cm  ).  This  band  was  assigned  to  the  vibration  of  the  cation  in  a  solvent 
cage.  But  these  authors  did  not  pay  attention  to  the  low  frequency  spectrum. 
Furthermore,  our  study  shows  that  the  low  frequency  bands  are  affected  both,  by 
the  halide  anion  and  by  the  solvent.  When  acetonitrile  is  substituted  to  methanol 
in  Lil  solutions,  the  Li*  solvation  band  is  shifted  at  lower  frequency  (460  -*  408 
cm  1 ) ,  whereas  the  other  two  bands  are  strongly  shifted  in  the  opposite  direction 
(235  -*  362  cm  1 ,  and  85  -•  145  cm  l,  respectively).  When  chlorine  is  substituted 

to  iodine  in  methanol  solution,  the  Li*  band  and  the  intermediate  band  are  mostly 
unaffected,  but  the  band  at  85  cm  1  shifts  to  135  cm  X.  Finally,  for  Rbl  in 
CHjOH,  the  absorption  profile  is  mainly  located  at  low  frequency  with  a  maximum 
around  85  cm  1 . 

All  the  aforementioned  spectral  features  can  be  understood  in  the  framework  of 
our  theoretical  analysis.  The  theory  predicts  four  vibration  bands  corresponding 
to  the  oscillatory  motions  of  the  four  ionic  species  (cation,  anion,  CIP,  SSIP). 
Moreover,  in  the  case  of  ion  pairs  (CIP  and  SSIP)  each  vibration  band  is 
convoluted  with  a  rotation  band  which  expresses  the  libration  of  the  ion  pair 
into  a  solvent  cage.  To  illustrate  our  purpose,  we  show  in  Fig. 2  the  theoretical 
solvation  band  of  Lil/CHjCN.  One  notices  (i)  a  high  frequency  peak  at  410  cm 

*  -l 

corresponding  to  the  vibration  of  Li  ,  (ii)  an  intermediate  peak  at  340  cm  due 

to  the  stretching  mode  of  CIP,  (iii)  a  composite  low  frequency  band  (0-120  cm  1 ) 

which  is  the  superposition  of  the  oscillatory  mode  of  I  (87  cm  1 ) ,  the 

stretching  mode  of  SSIP  (75  cm  1 ) ,  and  the  libration  modes  of  CIP  and  SSIP  (100 

cm  1  and  35  cm  1 ,  respectively).  The  likeness  between  theoretical  and 

experimental  spectrum  is  striking.  However,  the  relative  contribution  of  each 

band  to  the  calculated  absorption  intensity  is  only  indicative  since  it  is 

proportional  to  the  concentration  ratio  of  the  ionic  species,  a  quantity  which  is 

badly  known.  In  the  same  way,  the  calculated  bandwiths  are  related  to  the 

diffusion  coefficients  of  the  ionic  species  ;  but,  there  also,  these  quantities 

are  not  accurately  known.  Nevertheless,  the  present  theoretical  analysis 
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elucidates  the  genesis  of  the  spectrum,  and  quantitatively  accounts  for  the 
solvent  effect.  For  example,  if  one  substitutes  methanol  to  acetonitrile,  the 
theory  predicts  that  the  Li*  band  is  shifted  at  higher  frequency  (410  -*  450  cm  1 ) 
as  a  result  of  a  tightening  of  the  first  solvation  shell,  while  the  stretching 
mode  of  CIP  decreases  (340  -»  200  cm  l)  since  the  net  force  between  the  partners 
is  weakened  by  the  protic  solvent.  For  Rbl  in  methanol,  the  vibration  bands 
collapse  in  a  single  band  located  below  120  cm  1.  All  these  predictions  are 
confirmed  by  the  experimental  data. 
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DNA  STRUCTURAL  PARAMETER  DETERMINATION  BY  VIBRATIONAL  DATA 
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c^dex  (France) 


SUMMARY 

Normal  coordinate  analysis  based  on  Raman  and  infrared  data 
of  oligo-  and  polynucleotides,  allowed  us  to  estimate  the 
structural  parameters  (dihedral  angles)  cf  nuclecsidic 
residues  involved  in  DNA  double  helical  chains.  The  present 
estimation  derived  essentially  from  the  vibrational  spectra 
recorded  in  solution  (Raman)  or  in  hydrated  thin  films 
(infrared)  complete  quite  well  those  based  on  the  other 
physical  methods  as  NMR  or  X-ray  diffraction. 


INTRODUCTION 

The  purpose  of  this  work  is  to  show  how  the  vibrational 
spectroscopy  can  be  used  as  a  probe  in  ordre  to  for  determine 
the  molecular  structural  parameters.  DNA  has  been  selected  as 
a  characteristic  example  because  of  its  high  flexibility.  The 
ability  of  this  macromolecule  to  adopt  right-  and  left- 
handed  double  helices,  is  now  well  known.  Among  the  molecular 
structural  parameters,  torsion  angles  play  the  most  important 
role  as  concerned  with  DNA  conformational  transitions.  Change 
m  these  angles  allows  the  double  helix  to  go  from  one  form 
to  another.  A  conformational  transition  can  be  detected  by 
vibrational  spectroscopy  by  the  change  m  the  intensity 
and/or  by  the  shift  in  the  positions  of  Raman  and  infrared 
peaks . 

From  a  classical  viewpoint,  the  vibrating  molecular  system 
can  be  considered  as  a  set  of  coupled  oscillating  point 
masses.  Currently,  a  harmonic  force  field  approximation  has 
been  used  in  order  tc  express  the  potential  energy  of  the 
vibrating  molecule.  This  energy  can  be  supposed  to  be 
invariant  upon  the  conformational  transitions.  Thus,  only  the 
change  in  the  kinetic  energy,  depending  strongly  on  the 
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molecular  geometry,  is  considered  as  responsible  of  the 
modifications  observed  in  Raman  or  infrared  spectra. 
Previously,  a  reliable  simplified  valence  force  field  has 
been  developed  by  our  numerical  investigations  so  as  to 
reproduce  the  positions  of  the  characteristic  vibrational 
modes  (vibrational  markers)  of  the  nucleosides  {ref.  1-2), 
sugar  (ref.  3)  and  phosphate-backbone  (ref.  4). 

In  the  present  work  our  aim  is  to  demonstrate  how  the 
nucleosidic  markers  behave,  upon  the  variation  of  the 
structural  parameters.  Our  attention  is  especially  focused  on 
the  manner  how  the  structural  parameters  can  be  deduced  from 
this  kind  of  investigation. 

NUCLEOSIDIC  VIBRATIONAL  MARKER  ANALYSIS 
Calculation  details 

Normal  mode  analysis  is  based  on  the  Wilson  GF-method  (ref. 

5).  To  perform  the  present  calculations,  an  adequate  code 
(NUCS)  has  been  elaborated.  Redundancy  among  the  internal 
coordinates  has  been  entirely  resolved.  Numerical 
computations  have  been  carried  out  both  on  a  DPX-network 
(BULL)  and  a  CRAY-2  computer.  The  CRAY  version  has  been 
vectorized  and  multitasked  in  orded  to  improve  its  run-time. 

It  turns  about  340  times  faster  than  the  scalar  DPX  version. 

Taking  account  of  their  important  contribution  to  the  DNA 
vibrational  markers,  the  purine  deoxynucleotides ,  i.e.  dG  and 
dA,  are  studied  in  the  present  investigation.  For  both  of 
these  residues,  dynamic  models  as  shown  on  figure  l  have  been 
used.  They  are  constituted  by  the  guanine  and  adenine 
residues  associated  to  2 ' -deoxyribose  sugars.  The  dynamic 
models  are  extended  up  to  their  03'  and  05'  terminals.  The 
orientation  of  the  base  is  determined  by  the  glycosidic 
torsion  angles,  i.e.  x  101 '  -Cl ' -N9-C4 )  .  Two  distinct 
conformations  can  be  considered,  namely  anti  (180°<  x  <300°) 
and  syn  (0°<  x  <90°).  The  sugar  conformation  depends  on  the 
five  dihedral  angles  around  the  ribose  ring  bonds,  which  can 
be  estimated  by  the  following  expression: 

4.  v  Tj  =  rm  cos  i  P  +  1 1+2)  47r/5) 

V 

Aw  v 

\  ..  ( 


3’ 


2' 


i=  1,  2,  3,  4,  5 
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where  Tm  represents  the  puckering  amplitude  (deviation  from  a 
planar  sugar)  and  P  denotes  the  phase  angle  of  pseudorotation 
characterizing  the  ring  conformation.  Two  types  of  sugars  are 
recognized,  namely  N(north)-type  (P=0°±90°)  and  S(south) -type 
( P=180°±90° )  (fig.  2.). 


Fig.  1.  The  dA  residue  (left)  as  involved  in  Z  DNA  (C3’~ 
endo/syn  conformation)  and  the  dG  residue  as  encountered  in  B 
DNA  (C21 -endo/anti  conformation). 


Fig.  2.  Influence  of  the  pseudorotation  phase  angle  on  the 
sugar  pucker  conformation. 


Normal  mode  analysis  has  been  performed  by  successive  jumps 
of  36°  (for  P  angle)  and  30°  (  ^  angle),  respectively.  xm 

angle  has  been  fixed  to  40°,  which  represents  a 

characteristic  value  for  the  sugars  found  in  DNA  (ref.  3). 
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Marker  modes  and  their  behaviour  ..versus  structural  parameters 
Four  of  the  dA  and  dG  vibration  markers  have  been  selected. 
They  have  been  detected  in  the  quasi-whole  oligo-  and 
polynucleotide  vibrational  spectra  and  give  rise  to  the  well 
resolved  infrared  and/or  Raman  bands  (ref.  6-10).  These  modes 
have  been  numbered  from  1  to  4.  And  their  positions  as 
detected  in  A,  B  and  Z  forms  of  poly  d(G-C)  and  poly  d(A-T) 
have  been  mentioned  in  table  1.  The  most  striking  effect  is 
that  these  modes  behave  in  exactly  the  same  manner  for  the  dA 
and  dG  residues  upon  the  DNA  conformational  transitions.  As 
it  is  shown  in  table  1,  the  mode  1  is  Raman  active,  while  the 
mode  2  is  observed  in  infrared  spectra.  The  mode  3  can  be 
detected  in  both  Raman  and  IR  spectra.  Its  evolution  is 
however  more  clear  in  IR  spectra  and  this  is  the  reason  why 
it  is  known  rather  as  an  infrared  marker.  Finally,  the  mode  4 
concerns  the  most  discussed  Raman  marker  taking  account  of 
its  important  shift  upon  the  B  to  Z  conformational 
transition. 

TABLE  1 

Experimental  positions  (cm-1)  of  the  four  marker  modes 
studied  in  the  present  work. 


Markers 


Polynucleotides 


poly 

d(G-C) 

poly  d(A-T) 

B 

Z 

A 

B 

Z 

mode 

1 

1422a 

1430a 

1415b 

1419c 

1440° 

mode 

2 

142Cd 

1409d 

1416e 

1425e 

1408e 

mode 

3 

137  4d 

1354d 

1374e 

1374e 

1357e 

mode 

4 

681a 

625a 

662b 

666c 

622° 

aref.  6 
bref.  9 
cref.  10 
dref.  7 
eref.  8 
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The  three  dimensional  representation  of  the  selected  marker 
frequencies  obtained  by  our  normal  coordinate  analysis  for 
the  dG  residue  have  been  shown-  in  figure  3.  The  P  parameter 
does  not  considerably  affect  the  mode  1.  While  both  P  and  x 
parameters  affect  the  mode  2.  The  mode  modes  3  -and'  .4  are  also 
dependent  of  these  conformational  parameters.  -A  .striking 
shape  of  variation  has  been  obtained  :f  or  mode  4 ..  .A  very  deep 
valley  is  found  in  the  region  of  the  Nr type  sugar  connected 
to  the  syn  bases  (Z  form).  A  less  deep  minimum  corresponding 
to  the  low  anti  bases  {A  form)  appears  on  the  surface 
relative  to  the  mode  4. 

Similar  surfaces  have  been  obtained  for  the-  dA  residue 
marker  modes,.  An  extended  report  of  these  calculations 
acompanied  by  more  detailed  information  concerning  the  normal 
mode  assignments  will  be  published  in  a  forthcoming 
publication. 

Estimation .of.  the  nucleosidio. structural  parameters 

Obviously,  the  structural  parameters  of  the  nucleosides 
involved  in  the  right-  and  left-handed  double  helices  can  be 
estimated  by  comparing  the  experimental  (table  1)  and 
calculated  (fig.  3)  results.  For  a  given  helical 
conformation,  P  and  x  parameters  should  be  extracted  from 
the  regions  of  the  surfaces  which  give  calculated  wavenumbers 
satifying  the  experimental  criteria  for  all  of  the  four 
markers.  The  results  of  this  estimation  is  presented  in 
tables  2  and  3.  This  structural  determination  based  on  the 
vibrational  spectroscopy  is  also  in  good  agreement  with  that 
derived  from  X-ray  diffraction  or  NMR  spectra. 

In  conclusion,  the  present  method  allows  to  make  a 
quantitative  determination  of  the  structural  parameters 
related  to  the  oligo-  and  polynucleotides  found  in  solid, 
fibrous  or  liquid  phases.  Moreover,  the  study  of  the 
evolution  of  the  vibrational  modes  versus  the  dA  residue 
conformations  led  us  to  predict  the  position  of  its  marker 
modes  in  Z  form  (ref.  11).  The  so-obtained  calculated  results 
could  be  verified  by  the  Raman  spectrum  of  poly  d(A-T) 
adopting  Z  helix  conformation  (ref.  10). 


Fig.  3  Three  dimensional  surfaces  representing  the  variation 
of  the  dG  residue  marker  mode  wavenumbers  (cm-1)  as  a 
function  of  the  P  and  y  (degrees)  structural  parameters. 
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TABLE  2 

Estimation  of  the  P  and  ,x  parameters  <(in  degrees)  for  the- 
dG  residues  as  involved  in  different  conformations  of  poly 
d(G-C) . 


DNA 

conformation 

P 

X 

•Nucleosidic 

conformation 

B 

162±18 

270±15 

C2 '  -rendo/high  anti 

Z 

18±36 

60±15 

C3 ’ -endo/syn 

TABLE  3 

Same  as  table2 
conformations 

but  for  the  dA 
of  poly  d(A-T) . 

residue 

involved  in  different 

DNA 

conformation 

P 

X 

Nucleosidic 

conformation 

A 

18±36 

210±30 

C3'-endo/low  anti 

B 

162±36 

270+15 

C2‘-endo/high  anti 

Z 

18±36 

60±15 

C3 1 -endo/syn 
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FORCE  FIELD  CALCULATIONS  OF  SIDEROPHORES. 

STABILITY  AND  CONFORMATIONS  OF  Felll  CHELATES  WITH 
NOVEL  IRON  RELATED  PARAMETERS. 
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2  Fac  Sciences  and  Tech,  Dept  Chemistry,  route  de  Kairouan, 
Monastir  Tunisia. 

SUMMARY 

Quantitative  estimations  of  conformations  and  energies  of  iron 
carriers  (siderophores)  are  made  by  molecular  mechanics.  New  iron 
related  parameters  are  designed  to  be  used  with  the  1985  Allinger 
MM2  parametrization.  They  are  tested  with  new  Felll  chelates 
containing  catechol  and  carboxylic  acid  subunits.  The  results  help  in 
tailoring  new  ligands  and  encourage  further  force  field  calculations 
of  iron  and  other  high  valent  metal  chelates. 

INTRODUCTION 

Iron  is  essential  for  life.  In  human  a  lack  of  the  element  causes 
anemia  but  an  excess  is  toxic  (ref  1).  On  the  other  hand  in  plants  a 
lack  of  the  element  causes  ferric  chlorosis  (ref  2).  Iron  carriers 
(siderophores)  are  organic  ligands  able  to  complex  and  transport 
Felll  for  establishing  an  appropriate  level  of  the  element  (ref  3).  So 
far  no  quantitative  study  relating  the  structure  of  these  molecules  to 
their  energy  of  association  with  the  ferric  ion  has  been  carried  out. 
It  is  the  purpose  of  this  paper  to  describe  the  new  parameters 
designed  for,  and  the  calculations  made  with,  the  molecular 
mechanics  procedure  used  to  study  this  question. 

MOLECULAR  MECHANICS  CALCULATIONS 
The  method 

Molecular  Mechanics  methods  (also  called  Empirical  Force  Field 
methods)  have  been  used  increasingly  to  study  molecular  structure 
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and  properties  (refs  4-7).  These  methods  are  based  on  the  assumption 
that  the  energy  of  the  molecules  can  be  expressed  by  calculations  of 
empirical  nature  based  on  classical  mechanical  principles.  The  total 
energy  of  the  molecular  system  under  study  is  treated  as  the  sum  of 
several  components 

E  (total)=  E  (1)  +  E  (0)  +  E  («>)  +  Enb  (r) 
where  E  (1)  =  bond  length  deformation,  E  ($)  =  bond  angle 
deformation  ("Bayer  strain"),  E  (<j>)  =  tortional  eclipsing  energy 
("pitzer  strain")  and  Enb  =  non  bonded  or  Van  der  Waals 

interactions.  Adjustable  parameters  have  been  evaluated  to  treat  all 
energy  components.  In  the  1985  Allinger  MM2  program  (ref  8), 
presently  the  most  widely  used,  parameters  are  available  for  carbon, 
hydrogene,  and  most  of  the  heteroatoms  but  not  yet  for  Iron.  We 
have  first  designed  new  parameters  for  treating  the  ferric  ion. 

Design  of  parameters  related  to  Felll. 

Since  the  complexation  of  iron  by  organic  ligands  is  made 
through  oxygen  atoms  Ej  and  E$  involve  only  Felll-oxygen 

interactions. 

(i)  Iron-oxygen  bond  length  elongation  energy  E;.  It  is 
calculated  by  Hooke's  law  E(l)= 1/2  L  kj(  1  -  1D)2.  The  bond  length 
Fe-0  is  taken  from  X-ray  data  (ref  9)  as  10=1,9955A.  The  constant  kl 
is  estimed  from  IR  data  of  [FeF6]3'  where  v=  538  cm-1  relates  to 
Fe-F  stretching  (ref  10);  since  v=  1/2tc  Vk/jt  and  |j=l/m  j  +  l/m2  (where 
p=  reduced  mass,  mj  mass  of  iron,  m2  mass  of  fluor)  then 
k=  4jt2v2c2  =3.260  mdyn/A. 

(ii)  Bending  (bond  angle  deformation)  energy  Efl.  Also 
calculated  by  Hooke's  law  E(i3)=l/2  X  k|j(  'O-'Oo)2.  The  constant  k^ 
also  derived  from  IR  data  of  [  FeFg]3'  (reflO)  gives 
kb  =  2.039  mdyn  A/rd2  from  v  =  248  cm-1.  i)o  is  derived  from  an 

octahedral  structure  of  the  iron  complex  where  the  angle  O-Fe-O=90° 

(iii)  Tortional  eclipsing  energy  E^.  Estimed  by  an  equation  of 

the  form 

E(q> )=  1/2  X  (v i  (l+cos<|))  +v2  ( 1  -cos24> )  +v3  (l+cos3<j>)  +...). 
vi,  V2  and  V3  were  obtained  by  making  the  following  approximations: 


383 


Atoms  involved  in 

Parameters  used 

Taken  from 

In  reference 

tortional  eclipsing 

vi 

v2 

V3 

"equivalent"series 

energy 

1-1-6-26 

.8 

0 

.09 

1-1 -6-5 

ref  1 1 

3-1-6-26 

0 

0 

.09 

3-1-6-5 

ref  11 

5-3-6-26 

0 

0 

.2 

5-1-6-5 

ref  11 

2-2-6-26 

1 

1.65 

0 

2-2-6-5 

ref  11 

1-3-6-26 

0 

.5 

0 

1-3-6-5 

ref  11 

7-3-6-26 

3.28 

5.6 

0 

7-3-6-5 

ref  11 

1-6-26-6 

0 

0 

.33 

2-2-25-2 

ref  12 

2-6-26-6 

0 

0 

.33 

2-2-25-2 

ref  12 

3-6-26-6 

0 

0 

.33 

2-2-25-2 

ref  12 

The  atoms  code  is: 

:  1=  C 

sp3.  2= 

Csp2. 

3=  C(of  CO),  5= 

H,  6=  doubly 

bonded  oxygen,  7=  0  (of  CO),  25=  P,  26=  Fe. 

(iiii)  Non  bonded  or  Van  der  Waals  energy  Er.  Based  on  a  "Hill 
fonction"  E  (r)  =  8  (  -  Ci  (  r*/r6)  +  C2  exp  (  -  C3  (  r/r*  )  )  )  where  e 
is  an  estimate  of  iron  "hardness";  r*  the  sum  of  the  Van  der  Waals 
radii  of  interacting  atoms;  r  the  interatomic  distance;  Ci,  C2,  and  C3 
numerical  constants.  In  the  MM2  parametrization  (ref  8),  e  is  usually 
estimed  empirically.  Iron  being  a  bulky  atom  we  have  estimated  a 
value  8=  .17  close  to  the  value  used  for  Phosphorus.  For  evaluating 
r*  we  have  firstly  plotted  V.d.W.  radii  of  several  elements  as  a 
fonction  of  atomic  numbers  and  observed  parallel  lines  for  transition 
elements;  the  extrapolation  of  the  line  corresponding  to  the  fourth 
period  gives  r*(Fe)=  1.82  A,  from  which  for  Felll,  if  one  refers  to 
vanadium  (v),  we  have  obtained  2.2  A. 

However  V.d.W.  radii  used  by  Allinger  in  MM2  are  systematically 
different  of  usual  V.d.W.  values  proposed  by  Bondi.  In  fact  a  close 
examination  of  the  correlation  between  MM2  values  by  Allinger 
versus  "physical"  values  by  bondi  reveals  a  very  good  linear 
regression  of  slope  0.856  with  a  correlation  coefficient  R=0.994. 
From  this  correlation  the  Felll  radius  suitable  for  MM2  calculations 
is  estimated  to  be  r*(Fe3+)=  2.391  A. 

(iiiii)  Treatment  of  the  iron-oxygen  interaction.  The  series  of 
iron  complexes  that  we  have  studied  are  hexadentates.  The  atonies 
directly  bonded  to  iron  are  oxygen  atoms.  We  have  termed  the 
ligands  APn  (A  for  Acid,  P  for  pyrocatechol,  n  for  the  number  of 
methylen  units)  (ref  14). 
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The  interaction  of  iron  with  these  ligands  gives  a  complex 
[Fe111  ligand]  which  has  an  octahedral  structure. 

0 

1 1 


o 


scheme  1 

In  this  molecular  arrangement  two  different  iron-oxygen  interactions 
have  to  be  considered: 

-When  the  oxygen  atom  is  bonded  to  only  one  atom  other  than 
iron  it  bears  a  negative  charge  :  C-  Cr  — Fe3+ 

In  this  case  the  curve  E=  f(l)  corresponds  to  a  parabola  E=  k(  1  -  10)2 
where  k=3.26  mdyn  .  A'1  =  4.67  10'2  kcal  (A2)-1. mole  and  where 
10=  1.9955A  which  is  a  mean  value  of  Fe-0  distances  from  X-ray 
data. 

-When  the  oxygen  atom  is  bonded  to  two  atoms  others  than  iron 
<2^50”'Fe3+ 

it  bears  no  charge  :  cr  and  in  this  case  the  curve  E=  f(l) 

may  be  derived  from  a  minimum  in  energy  of  -3  kcal. mole-1  (often 
observed  in  iron  heteroatom  dative  bonds)  at  a  distance  Fe-0  of 
1.9955A,  and  from  a  value  E=0  at  distances  1.977  and  2.014A 
corresponding  to  extremes  values  of  Fe-0  bonds  in  X-ray  structures 
(ref  15).  The  equation  of  the  parabola  then  becomes 

E=  -3  +  k(l  -  1.9955)2 

with  k=  0.83  104  kcal. (A2)’1. mole'1  =  58  mdyn. A-1. 


r-,T  ' 
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Fig. 2  E=  f(l)  for  "negative"oxygens  Fig. 3  E— f(l)  for  "neutral"  oxygens 

The  steep  shape  of  the  curve  means  consequently  that  the  "neutral" 
(doubly  bonded)  oxygens  will  be  kept  close  to  the  iron  atom,  (and 
that  the  V.d.W.  interaction  between  these  two  atoms  are  omitted  in 
the  estimation  of  the  total  energy). 

Treatment  of  the  electrostatic  forces. 

Molecular  mechanics  methods  have  been  applied  recently  for 
estimating  complexation  constants  of  alkaline,  alkaline  earth  and 
other  metallic  cations  with  organic  ligands  (refs  16-18).  The 
complexation  of  iron  III  was  not  treated  except  in  a  recent  paper  by 
Lifson  et  al  who  used  a  Force  Field  Method  where  they  incorporated 
the  electrostatic  forces  (ref  19).  Taking  the  case  of  our  series  of 
complexes  to  exemplify  the  procedure: 


\(-i) 


The  total  energy  charge  of  ion  is  :  +3+4(-l)+2(0)=-l  (however  a 
fractionnal  negative  charge  =  -0.2  (ref  20)  could  be  given  to”neutral" 
oxygens  to  represent  the  electrostatic  attraction  between  iron  and 
oxygen  non  bonding  pairs). The  iron  atom  facing  the  electron  pairs  of 
"neutral”  and  "negative”  oxygen,  the  same  V.d.W.  repulsive  potential 
may  be  used  in  both  cases;  a  (r_6,exp)  or  a  (r_9>r’6)  expression.  The 
"Hill  function"  in  (r_6,exp)  used  by  Allinger,  incorporating  the 
V.d.W.  radius  and  the  "hardness”  of  iron,  gives  a  strong  destabilizing 


386 


energy  at  lengths  near  2A  close  to  those  of  Fe-0  bonds.  The  (r‘9>r_6) 
potential  (V=A/r9  -  C/r6)  may  also  be  used  for  the  modelisation  of 
the  electrostatic  potential  (ref  22);  in  this  case  the  A  and  C 
coefficients  are  deduced  from  iron  and  oxygen  parameters  by  two 
calculation  procedures  giving  almost  the  same  result. 

The  potential  V 9.6 =  e  (2(r"7r)9  -  3(r*  -  r)6) 

with  iron  parameters  :  £  =  .0192;  ri=  3.84A;  charge=  +3 

and  oxygen  parameters  :  £  =  .0198;  r2=  3.65A;  charge=  -1 

gives  the  curve  of  fig  4 

whereas  the  potential  V9-6=  A/r9  -  C/r6 

with  iron  coefficients  Ai=  2e  r*9  =6.97 103;  Ci=  3£  r*6  =  1.85  102 
with  oxygen  coefficients  A2=  2e  r*9  =  4,55  104;  C2=  3fi  r*6  =1.40 
103  and  with  iron-oxygen  coefficient  A=  VA1A2;  C=  VC1C2' 
gives  the  curve  of  fig  5 


The  comparison  of  potential  energy  curves  1,  3  and  4  shows  a 
different  value  of  the  minimum  in  energy  but  a  very  similar  shape  of 
the  curve  near  the  value  10.  In  other  words  the  geometries  calculated 
by  two  methods  would  be  very  similar  but  the  absolute  values  of  the 
strain  energy  would  differ  notably.  In  addition  the  value  10=  1.87  A 
used  by  Lifson  seems  short  compared  to  known  X-ray  Fe-0  bond 
lengths;  but  on  the  other  hand  the  shape  of  the  figure  3,  that  we  used, 
is  sharp,  which  means  a  narrow  range  of  variation  in  bond  length. 

In  short  the  procedure  described  in  this  paper  and  that  reported  by 
Lifson  should  give  very  similar  geometries,  slightly  different 
absolute  values  but  very  comparable  relatives  values  in  strain  energy. 
When  the  purpose  is  to  compare  relative  complexing  abilities  in 
given  series  both  procedures  should  be  helpful. 


A 
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RESULTS  AND  DISCUSSION. 

We  have  carried  put  the  calculations  on  HP  9000-82-5  computer 
with  BLEMO  (ref  21-22),  an  adapted  molecular  mechanics  program, 
written  in  FORTRAN  77,  containing  3731  lines  and  accepting 
molecules  with  135  atoms.  The  parameters,  related  to  iron  are  those 
described  in  this  paper,  otherwise  we  have  used  the  1985  MM2 
parameters  described,  by  Allinger  et  al  (ref  8)  .  The  minimization 
used  a  step  by  step  relaxation  procedure  carried  out  on  internal 
coordinates. 

The  purpose  of  the  calculations  was  to  find  the  most  efficient 
iron  complexant  in  the  series  APn.  These  molecules,  which  have  some 
"similarity"  with  EDTA  derivatives,  contain  two  carboxy-a-catechol 
moieties  separated  by  a  spacer  of  several  methylene  units;  The  energy 
of  the  optimized  geometry  of  iron  chelates  was  obtained  by  starting 
from  the  fully  symetrical  octahedral  structure  (scheme  1)  built  up 
with  "natural"  geometrical  parameters,  and  by  reaching  a  minimum 
in  energy  after  full  relaxation.  The  figure  6  shows  clearly  that  the 
most  stable  iron  chelate  is  obtained  when  the  number  of  methylene 
units  is  n=4 


fig. 6  Energy  vs  n  in  APn 


An  examination  of  the  possible  isomers  existing  in  each  case  shows  4 
geometrical  positions  depending  upon  the  location  of  the  catechol  and 
carboxylic  function  in  equatorial  or  axial  position  (ref  23). 
Moreover  each  geometrical  isomer  exists  in  threo  or  erythro 
configuration.  In  the  case  of  AP4  the  complex  of  lowest  energy 
corresponds  to  the  threo  form;  we  have  therefore  undertaken  the 
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preparation  of  the  appropriate  isomers  in  the  series  APn  by  a 
synthetic  route  involving  a-bromation  of  an  a-co-dicarboxylic  acid, 
followed  by  nucleophilic  substitution  with  ga'facol  under  phase 
transfer  catalysis  conditions  and  deprotection  of  the  methyl  ether;  the 
syntheses  are  in  course  (ref  14) 

In  addition  several  series  of  siderophores  which  have  structures 
homomorphic  to  entcrobactin,  like  CYCAM  (ref  24),  TRENCAM 
(ref  25),  or  MECAM  (ref  26),  have  been  prepared.  Ali  of  them 
contain  3  catechol  units  attached  to  a  spacer  of  different  nature.  We 
are  using  the  calculations  described  in  this  paper  to  design  and 
prepare  new  iron  complexants  of  this  type. 


CONCLUSION 

In  summary  we  have  extended  molecular  mechanics  calculations 
to  the  series  of  iron  complexants  APn  using  the  BLEMO  computer 
program  with  MM2  parameters  and  novel  iron  related  parameters. 
Calculated  geometries  are  consistent  with  previous  X-ray 
determinations  of  similar  derivatives.  The  relative  energies  allow  to 
select  the  most  efficient  complexant  in  the  series.  The  method  may  be 
helpful  in  tailoring  new  ligands. 
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DISCUSSION 


OEVILLERS  -  Is  it  possible  to  use  in  the  program  BIG  STRAIN  3  the  parameters 
designed  for  iron  in  this  study  ? 

BOURAOUI  -  BIG  STRAIN  "  is  one  of  the  major  Molecular  Mechanic  (MM)  programs 
available  on  QCPE  ;  but  each  program  runs  with  its  own  parametrization.  The 
parameters  designed  in  this  study  are  fitted  to  the  Allinger  MM2  parameters.  However 
the  same  methodology  may  be  followed  to  design  iron  parameters  suitable  for  an  other 
set  of  constants. 


DEVILLERS  -  Is  the  octahedral  structure  of  the  6  oxygens  surrounding  the  iron  atom 
symmetrical  or  not  ? 

BOURAOUI  -  The  calculation  is  started  with  an  octahedral  structure  fully  symmetrical, 
but  when  a  minimum  in  energy  is  reached  after  relaxation  the  octahedron  is  slightly 
distorted.  This  result  is  in  agreement  with  recent  X-ray  siudies  by  Raymond  at  al  on 
other  siderophores. 


BRUNEAU  -  How  is  made  exactly  the  parametrization  of  iron  ? 

BOURAOUI  -  From  geometrical  and  spectroscopic  data  :  the  V  d  W  radius  of  iron  is 
derived  from  a  number  given  by  Bondi,  corrected  to  fit  the  other  MM2  values,  the 
reference  length  l0  is  derived  from  X  ray  data.  The  constants  k|  and  kb  are  derived  from 
IR  data. 


LAMBERT  -  What  is  the  interest  of  these  ligands  ? 


BOURAOUI  -  The  goal  is  to  control  the  concentration  level  of  iron  by  complexing  and 
carrying  this  element  both  in  humans  and  in  plants. 
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DEMANGE  -  Is  it  possible  to  predict  the  energy  of  other  iron  complexes  with  ligands 
having  different  structures  and  to  compare  these  values  ? 

BOURAQUI  -  The  calculations  that  we  have  carried  out  are  related  to  the  same 
molecular  series,  and  therefore  the  relative  values  that  we  obtain  are  reasonably 
accurate.  Absolute  values  of  energy  obtained  with  totally  different  structures  would  be 
very  useful  but  need  at  the  moment  further  comparison  with  experimental  data. 


DEMANGE  -  Could  you  treat  other  ligands  where  the  ferric  ion  is  complexed  through 
sulfur  atoms  ? 


s 
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BOURAQUI  -  In  principle  there  is  no  fundamental  objection  and  no  major  difficulty  for 
designing  the  appropriate  parameters  and  for  doing  these  calculations. 
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MEHANI  -  What  industrial  applications  do  you  expect  for  these  iron  complexants  ? 

BOURAQUI  -  They  may  be  helpful  in  agriculture  for  treating  iron  chlorosis,  a  lack  of  iron 
in  plants  that  turns  the  leaves  yellow,  and  which  further  makes  the  plants  dry.  In 
addition  they  may  be  helpful  in  humans  for  increasing  the  level  of  iron,  become  low 
from  frequent  blood  transfusion,  which  is  toxic. 


MEHANI  -  Is  the  calculation  method  easy  to  operate  ? 

BOURAQUI  -  Yes,  the  program  has  been  devised  to  be  run  by  the  non  specialist.  It  is 
written  in  FORTRAN  77  ;  it  countains  3731  lines  and  accepts  135  atoms.  It  is 
conversational,  it  can  be  modified  easily  and  it  optimizes  more  rapidly  that  the  popular 
1985  Allinger  MM2  program. 
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FOURCOT  -  Is  it  possible  to  extend  the  calculations  to  other  alkaline,  alkaline-earth  or 
metallic  cations  ? 

BOURAOUI  -  Using  the  procedure  that  we  have  followed  this  extension  is  certainly 
possible,  provided  the  appropriate  parameters  can  be  designed. 


FOURCOT  -  What  is  exactly  the  MM  method  that  you  have  used  ? 

BOURAOUI  -  It  is  named  BLEMO  ;  it  makes  the  usual  minimization  in  energy  of  the 
classical  mechanical  terms  including  stretching,  bending,  eclipsing  and  Van  der 
Waals.  The  parameters  for  iron  are  described  in  this  study  ;  the  parameters  for  C,H,0 
are  from  Allinger  MM2.  The  program  is  available  on  request  to  B.  Blaive. 


GRAND  -  Where  do  the  molecules  that  you  have  studied  come  from  ? 

BOURAOUI  -  We  have  made  the  preparation  of  these  molecules  and  described  the 
synthetic  methods,  which  need  the  protection-deprotection  of  carboxylic  acid  and 
guaiacol,  in  a  Bull  Soc  Chim  paper  to  be  published  soon. 


GRAND  -  Did  you  compare  your  calculated  geometries  with  X-ray  values  ? 

BOURAOUI  -  We  plan  to  do  that  as  soon  as  we  will  have  monocrystals  of  sufficient 
size.  Presently  the  crystals  that  we  obtain  are  to  small  for  an  X-ray  analysis. 


WARSHEL  -  How  does  the  experimental  complexation  constants  compare  to  the 
calculated  energies  of  the  complexes  ? 

BOURAOUI  -  We  are  presently  meeting  difficulties  with  these  measurements  because 
of  the  low  solubility  of  the  complexes  in  water  ;  the  problem  is  being  solved  by 
increasing  the  hydrophilicity  of  the  complexes  via  sulfonation  of  the  aromatic  rings. 
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WiPFF  -  Do  you  consider  the  charges  on  the  oxygens  and  on  the  iron  ? 


1  BOURAOUI  -  The  calculations  consider  only  the  Van  der  Waals  interactions.  However 

| 

V  we  have  compared  two  procedures  :  the  first  without  the  charges  and  the  second  with 
the  charges.  The  two  approaches  give  comparable  potential  curves  and  consequently 
f  similar  relatives  energies. 

t 

t 


i 


3 


395 


Modelling  of  Molecular  Structures  and  Properties.  Proceedings  of  an  International  Meeting, 
Nancy,  France,  11-15  September  1989,  J.-L.  Rivail  (Ed.) 

Studies  in  Physical  and  Theoretical  Chemistry,  Volume  71,  pages  395-400 
©  1990  Elsevier  Science  Publishers  B.V.,  Amsterdam  —  Printed  in  The  Netherlands 


RELATION  ENTRE  CONSTANTES  DE  FORCE  ET  CONSTANTES  ELASTIQUES  DANS  DIVERS 
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SUMMARY 

A  general  matrix  method  is  used  for  calculating  vibrational  frequencies  and 
elastic  constants  from  a  generalized  valence  force  field  (GVFF).  The  validity  of 
the  method  is  discussed  and  the  most  important  parameters  influencing  the  elas¬ 
tic  behaviour  of  the  sudied  compounds  are  pointed  out. 


INTRODUCTION 

Le  but  de  cette  etude  est  de  relier  l'elasticite  macroscopique  d'un  compose, 
definie  par  les  elements  C.^  du  tenseur  d'elasticite,  a  l'elasticite  microsco- 
pique,  definie  par  les  constantes  de  force  d'elongation  de  liaisons,  de  defor¬ 
mations  angulaires  ,  de  torsion...  d'un  champ  de  force  de  valence  generalise. 


METHODES  UTILISEES 


La  determination  des  constantes  de  force  a  partir  des  spectres  de  vibration 

experimentaux  est  basee  sur  la  methode  des  matrices  GF  de  Wilson  (ref.  1)  eten- 

due  au  cristal  selon  la  theorie  de  Shimanouchi  et  coll  (ref.  2). 

Les  elements  C..  du  tenseur  d'elasticite  ont  ete  calcules  a  partir  des  cons- 
i  J 

tantes  de  force  par  la  methode  matricielle  developpee  par  Shiro  et  coll  (ref.  3) 
a  partir  de  la  theorie  de  Born  et  Huang  (ref.  4).  La  matrice  du  tenseur  d'elas¬ 
ticite  est  donnee  par  la  formule  generale  : 


c  =  Id  rD  d  -  d  f°  b  [¥  f°  b  V1  b  f“  d  i  /  v 

<oRo  o  R  p[p  R  pj  pKa> 
dans  laquelle  F°  est  la  matrice  de  l'energie  potentielle  (constantes  de  force), 
Dp  est  la  matrice  dynamique  definie  a  partir  de  la  matrice  des  coordonnees 
internes  et  des  positions  atomiques;  Bp  est  la  matrice  de  transformation  des 
coordonnees  cartesiennes  en  coordonnees  internes  et  v  est  le  volume  de  la  maille 
cristalline.  Le  programme  que  nous  avons  ecrit  calcule  de  plus  les  derivees 
partielles  des  constantes  elastiques  par  rapport  aux  constantes  de  force  (dis¬ 
tribution  de  l'energie  potentielle  ou  DEP). 

Les  calculs  ont  ete  effectues  sur  un  ordinateur  UNIVAC  1110. 

Le  champ  de  force  peut  ainsi  etre  determine  a  partir  des  frequences  experi- 
mentales  de  vibration  ou  (et)  a  partir  des  elements  C. .  experimentaux.  Dans 

■  si 

cette  etude,  nous  nous  sommes  interesses  a  des  composes  dont  les  valeurs  expe- 
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Fig.  1.  Structure  des  composes  etudies  avec  la  definition  des  principales 
coordonnees  internes  introduites. 
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mental es  des  frequences  de  vibration  et  des  constantes  elastiques  etaient  con- 
nues  afin  (I)  d'evaluer  la  validite  de  la  methode,  notamment  pour  des  composes 
iono-covalents  comme  les  oxydes  et  (II)  de  comprendre  le  comportement  elastique 
des  composes  etudies. 

COMPOSES  ETUDIES 

Nous  avons  etudie  des  composes  tres  divers  dans  leur  nature  chimique  et  leur 
type  de  symetrie  : 

-  des  composes  carbones  :  C  diamant,  C  graphite  (et  son  compose  voisin  le  nitru- 
re  de  bore  BN),  polyethylene. 

-  des  oxydes  :  quartz  alpha  et  alumine  alpha 

-  un  oxyde  mixte,  le  titanate  de  baryum,  BaTiO^. 

Pour  tous  ces  composes,  nous  avons  utilise  les  donnees  de  la  litterature  con- 
cernant  1 'attribution  des  frequences  de  vibration  aux  differents  modes  de  syme¬ 
trie  ainsi  que  les  valeurs  experimentales  des  constantes  elastiques. 

RESULTATS 

Sur  la  figure  1  sont  representees  les  structures  des  composes  etudies  avec 
la  definition  des  princi pales  coordonnees  internes  introduites.  Sur  le  tableau  1 
sont  regroupees  les  valeurs  des  constantes  de  force  correspondantes  ainsi  que 
les  valeurs  experimentales  des  constantes  elastiques  de  compressibilite  et  de 
cisaillement  et  la  DEP  pour  chaque  compose. 

Les  calculs  ont  montre  que  : 

1°)  les  champs  de  force  de  valence  generalises  utilises  sont  tout-a-fait  vala- 
bles  pour  calculer  avec  une  bonne  approximation  les  constantes  elastiques  et  les 
frequences  de  vibration,  mSme  dans  le  cas  de  composes  iono-covalents  ou  existent 
de  nombreuses  interactions;  les  frequences  sont  generalement  ajustees  avec  une 
erreur  de  l'ordre  de  3%,  pour  les  constantes  elastiques,  l'erreur  moyenne  est 
inferieure  a  20%,  ce  qui  est  satisfaisant  si  l'on  considere  que  les  erreurs  les 
plus  grandes  interviennent  pour  les  elements  extra-diagonaux  du  tenseur  d'elas- 
ticite  qui  ont  en  general  des  valeurs  faibles  et  peu  precises. 

2°)  la  contribution  relative  de  chaque  type  de  liaison  et  de  chaque  type  d'angle 
aux  constantes  elastiques  etant  connue,  il  est  possible  de  mettre  en  evidence 
les  parametres  qui  influencent  le  plus  le  comportement  elastique  des  composes. 

Les  resultats  obtenus  pour  les  composes  carbones  montrent  que  les  constantes 
elastiques  sont  tres  sensibles  a  la  dimensionalite  du  reseau  covalent.  On  obser¬ 
ve  tout  d'abord  que  les  constantes  de  compressibilite  du  diamant  et  du  gra¬ 
phite  et  0^3  du  polyethylene  presentent  une  contribution  des  elongations  des 
liaisons  C-C  et  des  deformations  des  angles  C-C-C,  tandis  que  ces  dernieres  sont 
preponderantes  dans  les  constantes  de  cisaillement. 

Si  l'on  compare  les  resultats  obtenus,  on  observe  que  : 


398 


-  les  constantes  du  diamant  (tridimensionnel ,  3D)  et  du  graphite  (2D)  sont 
egales,  alors  que  les  constantes  de  force  et  f  sont  environ  deux  fois  plus 
grandes  dans  le  graphite  que  dans  le  diamant. 

-  la  constante  du  diamant  (3D)  est  environ  quatre  fois  plus  grande  que  la 
constante  du  polyethylene  (ID),  bien  que  f^  et  f  soient  deux  fois  plus 
grandes  dans  le  polyethylene  que  dans  le  diamant. 

On  peut  done  dire  qu'a  constantes  de  force  comparables,  les  constantes  elasti- 


TABLEAU  1 

Constantes  de  force  principales,  constantes  de  compressibility  et  de  cisaille- 
ment  et  DEP  pour  les  composes  etudies. 


composes 

constantes  de 
definition 

force 

valeursa 

Cii 

constantes  elastiques 
valeurs0  DEPC 

C  diamant 

d(C-C) 

3,80 

C11 

1,06 

d  +  y 

°h 

y(C-C-C) 

0,51 

C44 

0,57 

d  +  y 

d(C-C) 

6,88 

C11 

C33 

1,06 

d  +  y 

C  graphite 

Y(C-C-C) 

1,08 

0,036 

1 

°6h 

1 ( 1 iai son 

0,014 

^44 

C66 

0,004 

1 

intercouche) 

0,44 

Y 

Polyethylene 

d(C-C) 

4,71. 

Si 

0,01  j 

liaisons 

C2h 

Y(C-C-C) 

1,10 

C22 

0,009  j 

^  Van  der  Waals 

C44,C55’ 

Cg6  <0,003 

! 

C33 

0,235 

d  +  y 

d(Si-O) 

5,77 

C11 

C33 

0,09 

Y  +  a 

Quartz  a 

y(O-Si-O) 

0,75 

0,107 

Y  +  « 

°3d 

a(Si-O-Si) 

0,22 

C44 

0,06 

d  +  y  +  a 

C66 

0,04 

Y  +  a 

Alumine  a 

d(Al-O) 

y(O-Al-O) 

1,15 

0,48 

C11 

C33 

0,50 

0,50 

d  +  y  +  a 
d  +  y  +  a 

°3d 

o(Al-O-Al) 

0,28 

C44 

0,15 

d  +  y  +  a 

C66 

0,17 

d  +  y  +  a 

d(Ti-O) 

0,87 

C11 

0,275 

d  +  D 

BaTi03 

Y(O-Ti-O) 

0,42 

C33 

0,165 

d  +  D 

C4v 

D(Ba-O) 

0,19 

C44 

0,054 

Y  +  D 

C66 

0,113 

Y  +  D 

^elongations  de  liaisons  en  mdyn.fi'1;  deformations  angulaires  en  mdyn.A.rad"^ 
£en  mdyn/fi2 

aistribution  d'energie  potentielle  (voir  texte) 
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ques  de  compressibilite  augmentent  avec  la  dimensionality  du  reseau  covalent. 

Les  resultats  obtenus  pour  le  titanate  de  baryum  montrent  l1 influence  de  V 
orientation  des  liaisons  par  rapport  aux  axes  du  crista! .  Dans  ce  compose,  la 
charpente  est  constituee  d’octaedres  TiOg  associes  par  les  sommets,  le  reseau 
Ba-0,  beaucoup  plus  ionique  est  peu  rigide.  Les  constantes  de  compressibilite 
comportent  une  tres  forte  contribution  des  elongations  des  liaisons  Ti-0  qui 
sont  paralleles  aux  axes  du  cristal,  tandis  que  les  constantes  de  cisaillement 
ont  une  contribution  des  deformations  angulaires  O-Ti-O  et  des  elongations  des 
liaisons  Ba-0.  On  peut  done  dans  ce  cas,  relier  directement  la  rigidite  des 
liaisons  Ti-0  aux  constantes  de  compressibilite. 

Enfin,  l'etule  du  quartz  a  et  de  I'alumi.  .i  a  montre  1  'importance  de  la  coor- 
dinence  des  atomes  dans  le  comportement  elastique  d'un  compose.  Dans  le  quartz, 
le  silicium  et  l'oxygene  ont  respectivement  une  coordinence  4  et  2,  dans  l'alu- 
mine,  l'aluminium  a  une  coordinence  6  et  l'oxygene  une  coordinence  4.  On  cons¬ 
tate,  dans  le  cas  du  quartz,  que  1 'elongation  des  liaisons  Si-0  ne  joue  prati- 
quement  aucun  role  dans  les  constantes  de  compressibilite  comme  de  cisaillement; 
ceci  explique  les  faibles  valeurs  de  et  C.^  :  la  structure  est  consideree 
comme  flexible  (ref.  5)  malgre  la  forte  rigidite  des  liaisons  Si-0.  La  faible 
coordinence  des  atomes  d'oxygene  est  a  1‘origine  de  cette  flexibility  :  lors 
d'une  compression,  les  angles  Si-O-Si  se  deforment  facilement,  entrainant  la 
deformation  des  angles  O-Si-O,  mais  laissant  les  liaisons  Si-0  inchangees.  Ces 
resultats  sont  en  bon  accord  avec  des  etudes  structurales  realisees  sous  pres- 
sion  par  Jorgensen  (refs  5,6)  et  Le  Page  et  coll  (ref.  7). 

Par  contre,  dans  I'alumine  a  ou  la  coordinence  des  atomes  est  plus  elevee,  le 
reseau  cri  stall  in  est  difficilement  deformable.  On  observe  que  1 'elongation  des 
liaisons  Al-0  a  une  contribution,  qui  peut  etre  preponderante,  a  cote  de  celle 
des  deformations  des  angles  0-A1-0  et  A1-0-A1,  aux  constantes  de  compressibi¬ 
lite  et  de  cisaillement.  Ceci  explique  que  les  constantes  de  compressibilite  de 
I'alumine  a  sont  environ  cinq  fois  plus  grandes  que  celles  du  quartz  a,  alors 
que  la  constante  de  force  des  liaisons  Al-0  est  environ  cinq  fois  plus  petite 
que  celle  des  liaisons  Si-0. 

CONCLUSION 

Cette  etude  a  permis  de  relier  1 'elasticity  macroscopique  et  1 'elasticity 
microscopique  dans  des  composes  chimiquement  tres  differents.  On  a  pu,  pour  cha- 
cun  d'eux,  determiner  un  champ  de  force  de  valence  generalise  permettant  une 
bonne  approche  a  la  fois  des  frequences  de  vibration  et  des  constantes  elasti- 
ques.  Le  developpement  futur  d'un  tel  travail  est  la  prevision  du  comportement 
elastique  de  materiaux  pour  lesquels  les  elements  C.,  du  tenseur  d'elasticite 

'  J 

ne  sont  pas  connus,  a  parti r  des  donnees  structurales  et  de  champs  de  force 
deduits  des  donnees  de  spectroscopie  vibrationnelle. 
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SUMMARY 

The  distinct  vibrational  frequencies  for  the  2-pyridone  as  monomer  and  cen- 
trosymmetric  dimer  are  studied,  with  respect  to  concentration  in  CH  CI3  (or 
CDC1 3)  and  CH.,  CN  (or  CDo  CN)  solutions.  Distinct  frequencies  have  been  obser¬ 
ved  for  about Jone  half  or  the  fundamental  modes  not  only  for  the  ones  directly 
involved  in  hydrogen  bond  association  (modes  of  the  NH  and  CO  groups)  but  also 
for  ring  modes  sensitive  to  the  association  state.  The  present  results  can  serve 
to  distinguish  a  free  from  an  hydrogen  bonded  nucleic  base. 


INTRODUCTION 

Vibrational  spectroscopy,  specially  ultraviolet  resonance  Raman  spectrosco¬ 
py,  is  often  used  to  study  nucleic  bases  association  or  tautomerism,  nucleic 
acids  structural  changes  or  nucleic  acids  interaction  with  proteins,  metal  ions 
or  drugs  (ref.  1).  For  such  purposes, it  would  be  of  interest  to  correlate  the 
frequency  shifts  or  the  intensity  variations  of  the  lines  to  the  association 
state  of  the  nucleic  bases. 

The  nucleic  bases,  which  possess  a  pyrimidime  or  purine  skeleton,  may  give 
rise  to  tautomerism  involving  their  hydroxy- (oxo-)  or  ami  no- (i mi  no-)  group. 

Such  tautomerisms  may  be  correlated  with  mutagenesis  (refs.  2-4)  because  the  as¬ 
sociation  between  two  bases  by  hydrogen  bond  leads  to  a  mispairing  when  one  of 
the  bases  is  present  in  a  rare  tautomeric  form.  The  2-pyridone  molecule  is  a 
good  one  to  study  such  phenomenon  because  it  possesses  less  heteroatoms  than  nu¬ 
cleic  bases  and  then  less  possibilities  of  association. 

For  heterocyclic  molecules,  tautomerism  (ref.  5)  and  infrared  spectra  (refs. 
6-8)  have  been  reviewed.  A  special  review  concerns  the  pyridine  derivatives 
(ref.  9). 

*Taken  in  part  from  a  thesis  (Doctor at  d'Etat)  to  be  submitted  to  Paul  Sabatier 
University,  Toulouse,  France,  by  S.  Castillo.  Preliminary  results  were  presented 
on  the  occasion  of  the  1 OO^h  birthday  of  Professor  G.  Mignonac  Toulouse  1989 
and  at  the  "Journees  de  Chimie  Physique", Paris,  May  1989. 
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The  unsubstituted  2-pyridone  appears  as  predominant  lactam  tautomer  in  the  i 
solid  state  and  in  aqueous  solution  (refs.5,9).  In  the  solid  state  it  occurs  in  ) 
helicoidal  chains  (refs- 10-11 }.  In  various  solvents  there  is  an  equilibrium  bet-  j 

ween  the  2-pyridone  monomer  (H)  and  the  centrosymmetric  cyclic  dimer  (D)  of  this  j 

molecule  : 


Many  previous  papers  give  data  on  the  2-pyridone  vibrational  spectra  (refs. 
12-22)  but  only  a  few  have  proposed  some  partial  and  sometimes  conflicting  as¬ 
signments  (refs.  16,18,21).  Isotopic  substitution  ^0/^0  (refs.  19-20),  ND/NH 
and  ^N/^N  (ref.  20)  has  shown  that  \>C  =  0  and  5NH  motions  are  coupled  with  mo¬ 
tions  of  the  ring  but  experimental  data  have  not  been  fully  interpreted.  A  same 
vibrational  mode  has  been  assigned  at  different  frequencies  for  the  monomer  and 
the  cyclic  dimer  only  for  the  vNH  (refs.  13>15)  and  vC  =  0  (ref.  22)  modes. 

In  studies  on  pyridinium  (refs.  23-24)  pyrazinium  (refs.  23,25)  or  pyrimidi- 
nium  (refs.  23> 26)  salts  or  on  2-pyrimidone  and  2-pyrimidone  chlorhydrate  (refs. 
27-28),  some  ring  frequencies  have  been  found  sensitive  to  the  bonding  of  an  hy¬ 
drogen  atom  to  nitrogen  (NH  or  NH+).  Recent  work,  with  both  experimental  and 
theoretical  approaches,  on  heterocyclic  molecules  such  pyridine  (refs.  29-30) 
r-pyrone  (ref.  31),  uracile  (ref.  32)  may  also  be  used  to  precise  the  assignment 
of  2-pyridone  vibrational  spectra. 

Complete  assignment  of  2-pyridone  and  N-met.hyl -2-pyri done  vibrational  spec¬ 
tra  will  be  published  elsewhere  (ref.  33).  In  the  present  work  we  emphasize 
only  the  distinct  frequencies  for  2-pyridone  monomer  or  dimer  which  may  serve 
to  distinguish  a  free  from  an  hydrogen  bonded  nucleic  base,  with  oxo  group. 


VIBRATIONS  THE  NH  AND  CO  GROUPS 

The  NH  and  CO  groups  are  directly  involved  in  the  hydrogen  bond  and  conse¬ 
quently  are  expected  the  more  sensitive  to  the  association  state. 

Modes  vNH  and  vCO  (table  1  and  figures  1(  vNH)  and  2  (  vCO)). 

It  is  noticeable  that  the  vNH  band  of  the  cyclic  dimer  (solution  of  2-pyri¬ 
done  0.5  M  in  CD  Clg)  is  centered  at  lower  frequency  than  the  vNH  band  of  the 
solid.  Then  the  N-H...0  hydrogen  bond  is  stronger  in  the  centrosymmetric  dimer 
than  in  the  helicoidal  chains  of  the  solid.  Similar  data  have  been  previously 
reported  in  the  vNH  (refs.  13,15)  and  the  vCO  (ref.  22)  ranges  and  interpreted 
in  the  same  manner. 
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TABLE  1.  Vibration  modes  of  the  2-pyridone  sensitive  to  the  association  state. 
Distinct  infrared  frequencies  for  monomer  and  dimer  in  chloroform  and  acetoni¬ 
trile  solutions3. 


Solid6 

Assignment0 

Solution 

0.5  M 

- ~1 

in  CDCl^ 
10'2M 

Solution  in  CH^CN^ 
0.5  Me  10"2M 

Monomer  (M) 

or  Dimer(D) 

3071 

\>NH 

3397 

3397 

331 2h 

3317 

M 

2829 

2828 

2827h 

2828 

D 

1652 

vC=0 

1674 

1674 

16761,j 

1676 

M 

1656 

1657 

!6591,J 

1659 

D 

1455 

19  a 

14749,j 

1475h 

f 

M 

1472 

1471 J 

1472h 

f 

D 

1430sh 

19  b 

1442 

1442 

1443h 

f 

D 

1418 

1416 

1425h 

f 

M 

1363 

14 

1376 

1376 

|1376h’ J 

f 

D 

1364 

1363 

jl  367h’ J 

f 

M 

1241 

6NH 

1254 

1254 

1251 

D 

1246 

1240 

1240 

M 

1232 

sCH 

1234 

1234 

1236J 

1237 

D 

1212 

1217 

1214j 

1213 

M 

1098 

sCH 

1099 

1099 

1099 

D 

1095 

1090 

1089 

M 

981 

yNH 

995 

995 

990 

991 

D 

,  M 

845 

1 

848 

846 

851 

850 

D 

830 

828 

828 

M 

730 

yCH 

f 

f 

730 

D 

725 

724 

M 

560 

SCO 

562 

562 

560 

559 

D 

549 

549 

549 

M 

475 

yCO 

512 

510 

M 

494 

495 

494 

D 

1 

a-  The  concentrations  have  been  selected  for  the  following  reasons  : 

-for  0.5  M  concentration  in  CHCl,  the  dimer  is  widely  predominant. 

-in  0.2  M  solution  in  CH..CN  or  J10'2  M  in  CHC1,,  for  most  of  the  modes  the  dou¬ 
blet  components  relative  to  monomer  and  dimer  nave  intensities  of  the  same  or¬ 
der  of  magnitude. -for  a  10"2m  solution  in  CH3CN  the  dimer  is  widely  predominant, 
b-  KBr  pellet.  Many  shoulders  appear  on  the  solid  state  spectrum  of  the  2-pyridone 
It  is  sometimes  difficult  to  link  the  solid  and  solutions  spectra. The  assignment 
in  terms  of  monomer  and  dimer  works  only  for  spectra  of  2-pyridone  solutions, 
c-  Ring  modes  with  benzene  notation  (ref.  23).  d-  For  a  doublet  relative  to  the 
same  vibration  mode  of  the  monomer  and  of  the  dimer  we  have  underlined  the  fre¬ 
quency  of  the  more  intense  component,  e-  Except  otherwise  mentioned,  f-  masked 
by  solvent  band,  g-  Uncertain,  partially  masked  by  solvent  band  (poor  compensa¬ 
tion).  h  -  2-pyridone  0.2  M  in  CD,CN.  i-  2-pyridone  0.2  M  in  CH..CN. 
j-  Nearly  equal  intensity  for  monomer  and  dimer  components  for  '’this  concentra¬ 
tion. 
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Modes  6 NH  and  yNH  (table  1  and  figure  3  (  5  NH) ) 

For  these  two  modes, our  assignment  is  coherent  with  the  previous  ones  for 
pyridinium  salts  (refs.  23-24),  2-pyrimidone  and  its  chlorhydrate  (refs.  27-28) 
and  with  the  ^N/^N  isotopic  shifts  observed  for  the  2-pyridone  (ref.  20). Fur¬ 
thermore  this  assignment  is  supported  by  the  fact  that  the  strong  and  large  in¬ 
frared  bands  at  1241  and  981  cm’ 1  for  2-pyridone  in  the  solid  state  are  missing 
in  the  infrared  spectrum  of  pure  N-methyl -2-pyridone  (ref.  33).  The  dilution 

effect,  both  in  CH,  CN  and  CD  Cl,,  shows  clearly  that  the  «NH  mode  appear  near 
-1  ^  -l*5 

1250  cm  (D)  and  near  1240  cm  (M).  The  major  polarized  Raman  lines,  for  solu¬ 
tions  of  2-pyridone  0.2  M  in  CH,  CN  or  CD,  CN,  lie  at  1252  (D)  and  1239  cm’](M). 
In  1260-1200  cm  range, one  «CH  mode  is  also  observed  near  1235  cm  (D)  and 
near  1215  cm’^  (M).  For  the  yNH  mode  it  has  not  been  possible  to  observe  at  dif¬ 
ferent  frequencies  the  yNH  modes  of  the  dimer  and  of  the  monomer. 


1-  solid  0<3r) 

2-  CH3CN  solution  0.5  M 

3-  CH3CN  solution  10“ 2;  1 

4-  COCI3  solution  0.5  M 

5-  COCI3  solution  10‘^M 


1280  1190 


1280  1190  1280  1190 

Fig.  3  Infrared  spectra  of  the  2-pyridone  in  solid  state  and  in  solution. 
1280-1190  cm-1  range. 


Modes  SCO  and  yCO  (table  1  and  figure  4) 

Our  interpretation,  supported  by  dilution  effects  in  CH,CN  and  CHCl,,is  co- 
herent  with  the  0/0  isotopic  shifts  observed  for  the  2-pyridone  (ref.  19) 
and  consistent  with  studies  on  2-pyrimidone  (refs.  27-28)  and  2-pyridinethione 
(ref.  34). 


ftMgasK^J'Qr 
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Fig.  4  Infrared  spectra  of  the  2-pyridone  in  solid  state  and  in  solution. 
6  CO  and  yCO  ranges. 
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1500  1350  1500  1350  1500  1350 


Fig.  5  Infrared  spectra  of  the  2-pyridone  in  solid  state  and  in  solution. 
1500-1350  cm'l  range. 
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RING  VIBRATIONS  SENSITIVE  TO  THE  ASSOCIATION  STATE 
With  the  help  of  dilution  effects  in  deuterated  solvents  (CDgCN  or  CDCl-j)  it  has 
been  possible  to  assign  distinct  frequencies  to  monomer  and  dimer  for  the  19a, 
19b,  14  ring  modes  in  the  1500-1350  cm"^  range  (table  1  and  figure  5).  Such  da¬ 
ta  have  not  been  previously  reported  because  previous  studies  in  solution  have 
been  made  only  in  hydrogenated  solvents. 

Other  distinct  frequencies  for  monomer  and  dimer  (table  1)  are  one  «CH  near 
1100  cm"\  the  1  ring  mode  at  850  cm  \  and  one  vCH  around  730  cm~^.  In  solu¬ 
tion  of  2-pyridone  0.2  M  in  CH^  CN  strong  and  polarized  Raman  lines  are  obser¬ 
ved  at  843  (D)  and  826  cm  ^  (M)  for  the  1  ring  mode  . 


REFERENCES 

1  Proceedings  of  the  eleventh  International  Conference  on  Raman  Spectroscopy, 
5-9  September  1988,  London,  England  Ed.  by  R.J.H.  Clark  and  D.A.  Long,  John 
Wiley  and  sons,  1988. 

2  F.  Chapeville,  H.  Clauser  et  al ,  Biochimie,  Hermann,  Pan's,  1974,  pp.  539- 
540  and  807-808. 

3  L.  Stryer,  Biochemistry,  W.H.  Freeman  and  Cie,  San  Francisco,  1975,  pp.  640- 
641. 

4  J.D.  Watson,  Biologie  moleculaire  du  gene,  3eme  ed..  Intereditions,  Paris, 
1978,  pp.  252-257  (French  translation  of  "Molecular  Biology  of  the  Gene", 
Third  Edition,  Benjamin,  Menlo  Park,  1976). 

5  J.  Elguero,  Cl.  Marzin,  A.R.  Katritzky,  P.  Linda,  Advances  in  Heterocyclic 
Chemistry,  Ed.  by  A.R.  Katritzky  and  A.J.  Boulton,  Supp.  1.  The  tautomerism 
of  heterocycles.  And  references  therein-Academic  Press,  New  York  1976. 

6  A.R.  Katritzky,  The  infrared  spectra  of  heteroatomic  compounds,  Quart-Revs. 
(London),  13  (1959)  353-373. 

7  A.R.  Katritzky  and  A.P.  Ambler,  in  Physical  Methods  in  Heterocyclic  Chemis¬ 
try, A.R.  Katrizky,  ed.  vol  II,  Academic  Press,  New  York,  1963,  ch.  10  Infra¬ 
red  Spectra,  pp  161-360. 

8  A.R.  Katritzky  and  P.J.  Taylor,  in  Physical  Methods  in  Heterocyclic  Chemis¬ 
try,  A.R.  Katritzky,  ed.  vol  IV,  Academic  Press,  New  York,  1971,  ch  6.  In¬ 
frared  spectroscopy  of  heterocycles  p.  265-434. 

9  H.  Tieckelmann,  The  Chemistry  of  Heterocyclic  Compounds,  vol.  14, 

A.  Weissberger  and  E.C.  Taylor  eds..  Pyridine  and  its  derivatives  supplement, 
R.A.  Abramovitch  ed..  Part  Three-John  Wiley  and  Sons,  1974,  ch.  12,  Pyriai- 
nols  and  Pyridones,  pp.  597-1180. 

10  B.R.  Penfold,  The  electron  distribution  in  crystalline  a-pyridone,  Acta 
Cryst.  6  (1953)  591-600. 

11  U.  Ohms,  H.  Guth,  E.  Hellner,  H.  Dannohl  and  A.  Schweig,  2-pyridone,  CrHrNQ, 
crystal  structure  refinements  at  295  K  and  120  K,  experimental  and  theore¬ 
tical  deformation  density  studies,  Z-Kristallogr,  169  (1984)  185-200. 

12  J.A.  Gibson,  W.  Kynaston  and  A.S.  Lindsey,  The  infrared  spectra  of  some  py¬ 
ridones  and  quinolones  and  their  behaviour  in  the  Kolbe-Schmitt  reaction, 

J.  Chem.  Soc.  (1955)  4340-4344. 

13  Hideyo  Shindo,  Studies  on  the  infrared  spectra  of  heterocyclic  compounds  VI 
Infrared  spectra  of  substituted  a-pyridones  and  a-quinolones.  The  region 
from  2000  to  4000  carl,  Chem.  Pharm.  Bull.  Tokyo  7  (1959)  407-416. 

14  A.  Albert  and  E.  Spinner,  The  vibration  spectra  and  structures  of  the  hy- 
droxy-pyri dines,  and  hydroxy-pyrimidines  in  aqueous  solutions,  J.  Chem.  Soc. 
(1960)  1221-1226. 

15  L.J.  Bellamy  and  P.E.  Rogasch,  Proton  transfer  in  hydrogen  bonded  systems, 
Proc.  Roy.  Soc.  (London)  Ser.  A  257  (1960)  98-108. 


408 


16  A.R.  Katritzky  and  R.  A.  Jones,  Infrared  absorption  of  heteroatomic  and 
benzenoTd  six-membered  monocyclic  nuclei.  Part  X-Pyridones  and  pyrid-thiones 
J.  Chem.  Soc.  (1960)  2947-2953. 

17  R.  Isaac,  F.F.  Bentley,  H.  Sternglanz,  W.C.  Coburn, Jr.  C.V.  Stephenson  and 
W.S.  Wilcox,  The  far  infrared  spectra  of  monosubstituted  pyridines,  Appl. 
Spectrosc.  17  (1963)  90-97. 

18  E.  Spinner  and  J.C.B.  White,  Spectral  and  ionisation  constant  studies  of 

substituted  2-hydroxypyridines  (1 ,2-Dihydro-2-oxopyridines),  J.  Chem.  Soc. 
(B)  (1966)  991-995.  ,, 

19  G.H  Keller,  L.  Bauer  and  C.L.  Bell,  Infrared  spectra  of  2-pyridone  D0  and 
2-pyridone  180,  Can.  J.  Chem.  46  (1968)  2475-2479. 

20  R.A.  Coburn  and  G.O.  Oudek,  Spectroscopic  studies  of  isotopically  substitu¬ 
ted  2-pyridones,  J.  Phys.  Chem.  72  (1968)  1177-1181. 

21  J.  Morcillo,  M.  Gil,  D.  Escolar,  Espectros  infrarrojos  de  oxo  e  hidroxi-de- 
rivados  de  la  pyridina,  An.  Quim.  74  (1978)  1193-1198. 

22  A.  Fujimoto,  K.  Inuzuka  and  Ryuichi  Shiba,  Electronic  properties  and  n-n* 
absorption  spectrum  of  2-pyridone,  Bull.  Chem.  Soc.  Jpn.  54  (1981)  2802- 
2806. 

23  R.  Foglizzo,  Spectres  de  vibration  de  quelques  sels  de  pyridinium,  pyrazi- 
nium  et  pyrimidinium  entre  3300  et  30  cm"),  These  de  Doctorat  es  Sciences 
Physiques,  1970,  Paris. 

24  R.  Foglizzo  et  A.  Novak,  Spectres  de  vibration  de  quelques  halogenures  de 
pyridinium,  J.  Chim.  Phys.  66  (1969)  1539-1550. 

25  R.  Foglizzo  and  A.  Novak,  Infrared  and  Raman  spectra  of  pyrazinium  halides, 
Appl.  Spectrosc.  24(6)  (1970)  601-605. 

26  R.  Foglizzo  et  A.  Novak,  Influence  de  la  protonation  et  de  la  complexation 
sur  les  spectres  de  vibration  de  la  pyrimidine,  Spectrochim.Acta  26  A  (1970) 
2281-2292. 

27  E.  Picquenard,  Etude  du  noyau  de  la  pyrimidone-2,  mecanisme  d'echange  hydro¬ 
gene-deuterium  dans  la  trimethyl-4,  5,  6  pyrimidone-2,  spectrometrie  de  vi¬ 
bration,  These  de  Doctorat  es  Sciences  Physiques,  1982,  Paris. 

28  E.  Picquenard  et  A.  Lautie,  Etude  par  spectrometrie  Infrarouge  et  Raman  de 
la  pyrimidone-2  et  de  son  chlorhydrate  (chlorure  d'oxo-2  pyrimidinium), 
Spectrochim.  Acta  38A(6)  (1982)  641-648. 

29  G.  Pongor,  P.  Pulay,  G.  Fogarasi  and  J-E.  Boggs,  Theoretical  prediction  of 
vibrational  spectra-1.  The  in  plane  force  field  and  vibrational  spectra  of 
pyridine,  J.  Am.  Chem.  Soc.  106  (1984)  2765-2769. 

30  G.  Pongor,  G.  Fogarasi,  J.E.  Boggs  and  P.  Pulay,  Theoretical  prediction  of 
vibrational  spectra  :  The  out-of-plane  force  field  and  vibrational  spectra 
of  pyridine,  J.  Mol.  Spectrosc.  114  (2)  (1985)  445-453. 

31  P.Csaszar,  A.  Csaszar,  A.  Somogyi,  Z.  Dinya,S.  Holly,  M.  Gal  and  J.E.  Boggs, 
Vibrational  spectra,  scaled  quantum-mechanical  (SQM)  force  field  and  assign¬ 
ments  for  4H-pyran-4-one,  Spectrochim.Acta,  42  A  (4)  (1986)  473-486. 

32  L.  Harsanyi,  P.  Csaszar,  A.  Csaszar,  J.E.  Boggs,  Interpretation  of  the  vi¬ 
brational  spectra  of  matrix-isolated  uracil  from  scaled  ab  initio  quantum 
Mechanical  Force  Fields,  Int.  J.  Quantum  chem.  29(4)  (1986)  799-815. 

33  S.  Castillo,  Th.  Bouissou,  J.F.  Brazier,  J.  Favrot  and  A.  Zwick,  Assignment 
of  the  vibrational  spectra  of  2-pyridone  in  the  solid  state  and  in  solution 
as  centrosymmetric  dimer.  Comparison  with  N-methyl -2-pyridone,  in  prepara¬ 
tion. 

34  A.  Lautie,  J.  Hervieu  et  J.  Belloc,  Spectres  de  vibration  de  la  2-pyridine- 
thione,de  la  3-pyridazinethione  et  du  2,4-dithiouracile,  Spectrochim.  Acta, 
39A(4)  (1983)  367-372. 


409 


I 


* 

t 

J 


■*«w»w»«Mywi 


Modelling  of  Molecular  Structures  and  Properties.  Proceedings  of  an  International  Meeting, 
Nancy,  France,  11-15  September  1989,  J.-L.  Rivail  (Ed.) 

Studies  in  Physical  and  Theoretical  Chemistry,  Volume  71,  pages  409-415 
©  1990  Elsevier  Science  Publishers  B.V.,  Amsterdam  —  Printed  in  The  Netherlands 


KINETIC  MODELLING  OF  HETEROGENEOUS-CATALYZED  REACTIONS 
WITH  THE  ANACIN  SOFTWARE 

APPLICATION  TO  THE  HYDRODENITROGENATION  OF  PHENANTHRIDINE 
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SUMMARY 

The  hydrodenitrogenation  of  heavy  hydrocarbons  is  represented 
by  a  test  consisting  of  the  conversion  of  alkylanilines  in  the 
presence  of  nitrogenated  polyaromatics  compounds.  These  latter 
compounds,  such  as  phenanthridine,  strongly  inhibit  the 
conversion  of  alkylanilines. 

In  order  to  understand  these  inhibiting  effects,  a  computer 
tool  was  required  allowing  to  model  the  reactivity  of  the 
inhibitor.  A  first  order  preprogrammed  kinetic  model  was  defined 
taking  into  account  the  forward  reactions,  the  reverse  reactions 
and  the  adsorption  constants  of  the  reactants.  The  constants 
computed  by  numerical  integration  were  optimized  with  the  simplex 
algorithm.  In  the  case  of  the  phenanthridine 
hydrodenitrogenation,  the  best  fit  was  obtained  with  the 
optimization  of  all  the  constants. 

The  Anacin  software  thus  allows  to  determine  automatically  up 
to  twenty  constants  from  one  set  of  experimental  data. 
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INTRODUCTION 

Catalytic  hydrodenitrogenation  (HDN)  is  one  of  the  key  steps 
of  heavy  feedstocks  upgrading  through  their  conversion  into 
lighter  products.  Heavy  ends  have  high  nitrogen  contents  and 
their  conversion  yields  again  distillates  too  rich  in  nitrogen. 
They  will  be  integrated  in  a  normal  refining  scheme  only  after  a 
severe  HDN  treatment  (ref.  1).  Indeed,  combustion  of  N-compounds 
produces  nitrogen  oxides  and  the  presence  of  nitrogen  also  leads 
to  the  poisoning  of  metal  and  acid  catalysts  used  in  reforming, 
cracking  and  hydrocraking  operations.  Removal  of  nitrogen  is 
therefore  an  absolute  necessity  for  both  ecological  and 
economical  reasons. 
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The  major  problem  in  the  hydrodenitrogenation  of  distillates 
resulting  from  the  conversion  of  heavy  feedstocks  is  the 
conversion  of  basic  compounds,  alkylanilines  in  particular,  in 
the  presence  of  other  compounds  found  in  the  feed.  Alkylanilines 
appear  to  be  more  resistant  to  hydrodenitrogenation  compared  to 
their  conversion  when  used  as  single  components.  We  have  recently 
developed  a  simple  test  to  simulate  this  inhibiting  effect  on  a 
laboratory  scale  (ref.  2).  This  test  consists  of  the  conversion 
of  2 , 6-diethylaniline  in  the  presence  of  quinoline  or 
phenanthridine . 


2, 6-diethylaniline  quinoline  phenanthridine 


We  have  shown,  in  particular,  that  the  inhibiting  effect 
results  from  the  presence  of  aromatic,  partially  or  totally 
saturated  polycyclic  compounds.  This  inhibiting  effect  is  less 
pronounced  at  low  inhibitor  concentration.  This  effect  also 
depends  on  the  progress  in  the  conversion  of  the  inhibitor:  the 
inhibition  disappears  when  the  inhibitor  is  transformed  in 
lighter  molecules.  This  well  illustrates  the  problems  of 
competitive  adsorption  between  substituted  alkylanilines  and 
heavy  N-heteroatomics  present  in  the  feeds. 

In  order  to  gain  a  better  knowledge  on  these  complex 
inhibiting  phenomena  and  their  possible  quantification,  a 
computer  tool  was  necessary  first  to  interpret  the  experimental 
kinetic  results  and  then  to  model  these  inhibiting  effects. 

EXPERIMENTAL 

Experiments  were  carried  out  in  a  0.3  litre  stirred  autoclave 
operating  in  a  batch  mode  at  340°C  and  70  bars  of  hydrogen 
pressure.  Analyses  were  performed  on  a  Girdel  30  gas 
chromatograph  equipped  with  a  flame  ionization  detector  using 
hydrogen  as  carrier  gas.  The  columns  used  were  capillary  columns, 
CP  Sil  5CB  or  CP  Sil  19CB,  25  m  x  0.22  mm.  Products  were 
identified  by  comparison  with  authentic  samples  and/or  by  GC-MS 
analysis.  The  catalyst  used  was  Procatalyse  HR  346  which  has  the 
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following  composition  :  3%  NiO,  14%  M0O3  and  83%  AI2O3.  It  was 
sulfided  at  atmospheric  pressure  using  a  fluidized  bed  technique 
with  a  gas  mixture  of  15%  H2S  and  85%  H2  by  volume.  The  catalyst 
was  heated  in  flowing  H2/H2S  (gas  flow  :  120  ml/min)  from  20°C  to 
400°C  (8°C/min)  and  held  at  400°C  for  4  hours,  then  cooled  and 
swept  with  nitrogen. 

RESULTS  AND  DISCUSSION 

In  Scheme  1  is  reported  the  reaction  network  we  propose  for 
the  hydrodenitrogenation  of  phenanthridine  over  the  NiMo  HR  346 
catalyst  at  340°C  and  70  bars  H2.  Under  these  operating 
conditions  the  reaction  proceeds  through  successive  hydrogenation 
steps.  All  these  steps  are  equilibrated  as  shown  by  the  effect  of 
hydrogen  pressure.  By  increasing  the  pressure  from  70  to  140 
bars,  the  reactions  are  progressively  shifted  towards  the 
saturated  compounds,  in  agreement  with  the  law  of  thermodynamics. 


Scheme  1  :  Reaction  network  for  the  hydrodenitrogenation  of 
phenanthridine  at  340°C,  70  bars,  over  the  sulfided 
NiMo/Al2C>3  HR  346  catalyst. 


This  scheme  serves  us  as  a  frame  to  model  the  reactivity  of 
phenanthridine  and  more  generally  of  complex  reaction  networks. 
We  have  thus  developed  an  interactive  software,  named  AnaCin,  to 
analyse  the  results  of  kinetic  experiments.  This  software  is 
easy  to  use  by  the  average  experimental  chemist  and  makes  his 
work  easier.  AnaCin  is  written  for  the  Borland's  Turbo  Basic 
compiler  and  uses  the  80x87  coprocessor  facilities  to  reduce 
computation  time.  This  program  runs  on  IBM  PC,  PS  or  compatible 
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computers  equipped  with  a  CGA  or  EGA  graphic  card,  under  the 
Microsoft  Disk  Operating  System. 

The  preprogrammed  kinetic  model  used  by  the  software, 
consisting  of  six  reactants,  intermediates  or  reaction  products 
Yl,  Y2  ...  Y6,  is  given  in  Scheme  2.  The  A(i)  are  the  rate 
constants  for  the  forward  reactions,  C(i)  those  for  the  reverse 
reactions  and  the  b(i)  constants  are  the  constants  of  adsorption 
for  the  different  compounds  (i).  In  this  model  the  reactions  are 
assumed  to  be  first  order  in  each  reactant  and  all  the  adsorption 
sites  are  supposed  to  be  different. 

A(  1 )  A ( 2 ) 

yl  - b(l) - >  Y2 - b( 2 ) - >  Y3 

|  C(l)  |  C ( 2 )  | 

A( 3 )  A( 4 )  A( 5 ) 

b(  3 )  b(4 )  b  ( 5 ) 

C ( 3  )  C(  5 )  C ( 5 ) 

|  A(  6 )  |  A  C  7 )  | 

Y4  - b( 6 ) - >  Y5 - b(7) - >  Y6 

C(  6 )  C( 7) 

Scheme  2  :  Preprogrammed  kinetic  model  of  the  AnaCin  software 

The  system  of  differential  equations  resulting  from  the 
studied  model  is  integrated  numerically  by  the  well-known  Runge- 
Kutta  method.  The  constants  A(i),  C(i)  and  b(i)  are  optimized 
with  the  simplex  method  (ref.  3)  to  fit  in  with  the  experimental 
data. 

Application  to  the  hydrodenitroqenation  of  phenanthridine 

The  experimental  data  (Table  1)  are  entered  from  the  keyboard 
and  they  can  be  corrected,  saved  or  loaded  from  the  disc. 

The  experimental  points  are  visualized  on  the  screen  and  the 
aim  is  now  to  obtain  the  best  fit  between  experimental  and 
computed  curves  in  order  to  deduce  the  corresponding  rate 
constants.  This  can  be  done  manually  for  simple  reaction  schemes 
but  an  optimization  procedure  is  required  for  kinetically  and 
chemically  more  complex  reactions. 

From  the  experimental  data  of  Table  1  the  optimization  on  the 
forward  reaction  rate  constants  A(i)  gives  the  results  reported 
in  Fig.  1. 
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Vi  - >  V2 - >  V3 

V4  - >  ys - >  V6 


Exp  ] 

|  T)«ie 

|  Corml  | 

Conc2  | 

Conc3  | 

Conc4  | 

Conc5  | 

ConcG  | 

1 

0 

82.0000 

0.0000 

0.0000 

0.0000 

0.0000 

MEBT.Fera 

2 

5 

75.0000 

1.2000 

0.0000 

6.2000 

0.0090 

0.0OG0 

3 

10 

70.0000 

4.3000 

0.0000 

3.6000 

0.0090 

0.0000 

4 

20 

60.0000 

6.2000 

0.0000 

10.4000 

3.5090 

0.0000 

5 

30 

55.0000 

3.5060 

0.0000 

9.2000 

5.0090 

0.0000 

6 

40 

50.0000 

11.5000 

0.0000 

8.5000 

7.0090 

0.0000 

7 

60 

40.0000 

13.0000 

12.1900 

7.0000 

8.0000 

1.0000 

0 

80 

30.0000 

15.0000 

15.0000 

5.7000 

8 . 6000 

1.5000 

9 

100 

20.0000 

16.6000 

21.0000 

4.5000 

10.0000 

2.0000 

10 

120 

15.0000 

16.0000 

28.0000 

2.8000 

10.4080 

2.2000 

11 

150 

5.0000 

15.2000 

37.4000 

1.8000 

9.3090 

3.0000 

12 

200 

0.0000 

13.0000 

45.0000 

1.0000 

8.0090 

3.5000 

13 

248 

0.0000 

11.6000 

52.3800 

8.0000 

7.0090 

4.2000 

14 

300 

0.0000 

8.0000 

61 .7000 

8.0000 

5.4080 

7.4000 

Table  1  :  Example  of  experimental  data  editing 


The  optimized  residue  (weighted  sum  of  squares  of  errors)  is 
1228  and  it  can  be  seen  that  some  curves  do  not  fit  very  well, 
particularly  those  simulating  the  reactivity  of  the  intermediates 
Y2,  Y4  and  Y5.  This  was  not  unexpected  because  of  the  reverse 
reactions  present  under  these  experimental  conditions  (70  bars 
H2).  A  pressure  of  140  bars  is  required  to  minimize  the  influence 
of  the  reverse  reactions. 


Fig.  1  :  Optimization  on  forward  reaction  rate  constants 
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A  second  optimization  is  then  carried  out  by  taking  into 
account  the  forward  and  the  reverse  reactions  rate  constants.  The 
simulated  curves  fit  fairly  well  with  the  experimental  data  as 
seen  in  Fig.  2.  The  residue  is  now  388  and  these  results  could  be 
sufficient  for  the  experimenter,  considering  the  experimental 
error  on  the  determination  of  the  data  points.  Moreover,  the 
calculated  rate  constants  are  consistent  with  those  obtained  for 
related  model  compounds. 


Fig.  2  :  Optimization  on  forward  and  reverse  reaction  rate 

constants 


The  final  optimization  (Fig.  3)  also  taking  into  account  the 
constants  of  adsorption  of  Yl,  Y2  ...Y6  yields  the  smallest 
residue  (250)  and  the  best  fit  of  the  simulation  curves.  These 
results  are  still  consistent  with  the  reactivity  of  related  model 
compounds  and  with  their  aromatic  properties  :  Yl  and  Y4  are  the 
most  aromatic  and,  as  a  consequence,  the  most  adsorbed,  as 
already  observed  from  other  models  (ref.  4). 
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Fig.  3  :  Optimization  on  forward  and  reverse  reaction  rate 

constants  and  on  adsorption  constants. 


CONCLUSION 

The  AnaCin  software  easily  allows  to  interpret  the 
experimental  kinetic  results  obtained  for  complex  reactions.  It 
also  confirms  the  validity  of  the  kinetic  model  used  in  our 
approach.  This  software  works  at  a  reduced  running  cost  and 
requires  only  widely  used  computer  equipment.  It  allows  to 
determine  automatically  up  to  twenty  constants  from  one  set  of 
experimental  data  and  to  visualize  the  results  by  comparing 
experimental  and  simulated  curves. 

This  first  step  was  an  absolute  necessity  to  better  understand 
the  inhibition  phenomena  occurring  in  hydrotreating  reactions. 
The  following  step  will  deal  with  the  modelling  of  the  actual 
inhibiting  effects. 
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I.  INTRODUCTION 

Fast  chemical  reactions  in  condensed  phase  are  controlled  by  the  transport  process 
of  reactants.  Traditional  approaches  for  diffusion-controlled  reactions  are  all  based  on 
Fick's  law  to  describe  the  mutual  motion  of  reactants,  e.g.,  the  well  known 
Smoluchowski  theory  (ref.  1,2).  Although  this  kind  of  approaches  have  contributed 
much  to  our  understanding  of  various  diffusion-controlied  rate  processes,  experiments 
reveal  sometimes  their  inadequacy.  For  example,  under  the  framework  of 
Smoluchowski  theory,  the  parameters  iike  the  encounter  diameter  and  the  mutual 
diffusion  coefficient  are  not  consistently  determined  according  to  time-resolved  or 
continuous  excitation  experiments  of  fluorescence  quenching  (ref.  3,4).  To  test 
stringently  the  traditional  approaches  based  on  diffusion  equations  and  find  out  the 
origins  of  the  flaws  that  they  might  have,  we  have  recently  carried  out  a  molecular 
dynamics  simulation  of  a  model  diffusion-controlled  reaction  in  which  the  solvent  is 
treated  on  the  equal  footing  of  reactants.  As  can  be  expected,  it  is  found  that  the 
diffusion  equation  approach  is  inadequate  at  short  times.  Undoubtedly  the  short  time 
dynamics  should  not  be  Markovian.  To  take  into  account  non-Markovian  effect,  a  new 
theoretical  approach  is  developed  which  is  based  on  a  generalized  diffusion  equation. 
Here  we  will  give  a  brief  review  of  our  recent  effort  in  the  study  of  diffusion-controlled 
reactions. 

II.  SIMULATION 
A.  Model 

An  irreversible  chemical  reaction  between  the  reactive  solutes  A  and  B  in  an  inert 
solvent  S  consists  of  two  steps:  the  transport  of  reactants  toward  each  other  to  form 
encounter  pairs  and  the  intrinsic  chemical  change  which  yields  final  products,  i.e., 

A  +  B  - (AB) - Products 

where  k  and  ka  are  respectively  the  rate  coefficients  of  the  two  steps.  If  the  last  step  is 
very  fast  ( ka  »  k ),  the  overall  rate  of  the  reaction  is  dominated  by  the  encounter  rate 
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and  the  reaction  is  thus  called  diffusion-controlled,  which  is  the  object  of  the  work 
presented  here. 

When  the  intrinsic  reaction  rate  is  very  large,  the  chemical  transformation  of 
reactants  to  products  can  be  considered  to  take  place  instantaneously  once  the 
reactants  reach  their  encounter  distance.  This  is  the  argument  which  underlines 
Smoluchowski's  absorbing  boundary  condition  for  describing  reactivity.  In  the  practice 
of  simulation,  this  means  that  each  collision  between  reactive  molecules  entails 
certainly  an  instantaneous  reaction. 

In  our  first  simulation  (ref.  5),  reactants  and  solvent  molecules  are  all  modeled  by 
hard  spheres  of  the  same  size  and  mass.  The  only  difference  of  reactants  from  solvent 
molecules  is  their  reactivity.  In  the  implementation  of  a  simulation,  the  distinction  of  the 
reactants  from  the  solvent  molecules  can  be  made  by  labelling. 

Although  the  above  model  for  diffusion-controlled  reactions  is  highly  idealized,  it 
can  be  qualified  as  a  simple  "civilized"  model  in  which  solvent  is  treated  on  the  equal 
footing  of  reactants,  in  contrast  to  the  continuum  solvent  model  used  in  the  diffusion 
equation  approach. 

B.  Simulation  method 

Keeping  in  mind  that  we  like  to  study  a  prototype  of  fluorescence  quenching 
process,  our  first  simulation  is  carried  out  for  a  special  case  in  which  one  reactive 
species  A  (fluorophores)  is  highly  diluted  and  the  other  reactants  (quenchers)  are  in 
large  excess.  Under  this  condition,  the  reaction  of  each  A  is  independent  of  the  others. 
So  only  one  A  molecule  is  needed  to  generate  a  reactive  event  in  simulation. 

Starting  from  an  equilibrium  configuration,  reactants  A  and  B  are  designated 
randomly.  Then  the  trajectories  of  all  molecules  in  the  simulation  system  are 
generated  and  followed  by  using  the  molecular  dynamics  simulation  method  (ref.  6). 
Once  the  collision  between  the  molecule  A  and  one  molecule  B  occurs,  the  trajectory 
generation  is  stopped  and  the  survival  time  of  the  molecule  A  is  recorded.  The 
generation  of  one  reactive  event  is  thus  accomplished.  To  calculate  the  survival 
probability  of  the  molecule  A,  a  large  number  of  reactive  events  is  needed,  of  the  order 
of  105,  to  obtain  a  good  statistics.  A  procedure  to  generate  efficiently  the  reactive 
events  has  been  developed.  The  reader  is  refered  to  ref.  (5)  for  details. 

One  main  quantity  calculated  from  simulation  is  the  survival  probability  of  the 
molecule  A,  which  is  defined  as  that  the  molecule  A  remains  unreacted  till  to  time  t 
after  its  creation,  i.e., 

SA  ( t )  =  lim  ,  (1) 

o  N . 

N  oo  A 

A 
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where  NaO)  is  the  number  of  unreacted  molecules  A  in  the  ensemble  of  reactive 

o 

events  at  time  t  and  N,  is  the  total  number  of  reactive  events  in  the  ensemble,  noting 
that  N°  =  Na  (t  =  0).  In  practice,  SaO)  is  calculated  from  the  directly  measured  survival 
times  by  using 


na 

SA(t)  =  -r  I  eff-  i)  .  (2) 

Na  i=1  • 

A 

where  xj's  are  the  survival  times  of  the  molecule  A  in  reactive  events  and  0(x)  is 
Heaviside  function. 

Once  the  survival  probability  is  obtained,  the  time-dependent  reaction  rate 
coefficient  k(t)  can  be  calculated  by  using  the  phenomenological  kinetic  law 

^j^=-k(t)SA(t)pB  .  (3) 

where  pB  is  the  density  of  reactants  B. 

C.  Results  and  discussion 

The  simulation  result  of  the  survival  probability  is  given  in  Fig.  1.  along  with  the 
result  of  Smoluchowski  theory  which  is 

Sa  (t)  =  exp  J-  4  7c  o  D  pB  ( i  +  tj  ,  (4) 

where  D  is  the  mutual  diffusion  coefficient  of  reactants  and  a  the  encounter  diameter.  It 
is  seen,  from  Fig.  1.,  that  the  simulation  curve  of  Sa(0  is  below  that  of  Smoluchowski 
theory.  This  means  that  Smoluchowski  theory  underestimates  the  reaction  rate  at 
least  in  a  period  of  time. 

The  time-dependent  rate  coefficient  given  by  Smoluchowski  theory  is 

kS<t),4„CTD  (,  .  (5) 

The  comparison  of  this  result  with  simulation  is  presented  in  Fig.  2.  The  most 
remarkable  thing  in  Fig.  2.  is  that  there  is  a  demarcation  of  time  region.  At  long  times, 
t  >  tc,  Smoluchowski  theory  is  a  good  approximation.  However,  simulation  result  is 
quite  different  from  that  of  Smoluchowski  theory  at  short  times,  t  <  tc.  The  demarcating 
point  time  tc  corresponds  to  about  30  collisions  of  reactants  with  solvent  molecules 
before  reaction.  It  is  illuminating  to  note  that  velocity  correlation  becomes  negligible 
beyond  about  30  collisions  in  a  dense  fluid.  So  the  difference  observed  between 
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kMD(t)  and  ks(t)  is  clearly  due  to  dynamic  correlations  at  short  times.  As  consequence 
the  dynamics  of  reactive  encounters  is  nondiffusional  and  essentially  non-Markovian 
at  short  times.  In  the  next  section,  we  will  show  that  the  larger  reaction  rate,  compared 
to  Smoluchcwski's  result,  can  be  accounted  for  qualitatively  by  non-Markovian  effect. 


Fig.  1.  Survival  probability.  —  MD  result ; 

. Smoluchowski  theory. 

Calculation  conditions  :  total  fluid  density 

po3  =  0.786  and  p0O3  =  0.0644 
Unit  for  time  :  (C2m/kDT)1/2 

b 


Fig.  2.  Time-dependent  rate  coefficient. 

(i)  MD  result ;  (ii)  Smoluchowski  theory. 
Same  calculation  conditions  as  Fig.  1 . 

Unit  for  rate  coefficient :  (a4  kBT/m)1/2 


Larger  value  of  reaction  rate  observed  in  simulation  implies  that  the  exact  curve  of 
Stern-Volmer  plot  is  above  that  of  Smoluchowski  theory.  This  has  been  observed 
experimentally  (ref.  3).  Now  we  can  understand  at  least  qualitatively  why  different 
results  for  o  and  D  are  obtained  from  time-resolved  or  continuous  excitation 
experiments.  The  nondiffusional  dynamics  at  short  times  is  wholly  contained,  in  an 
integrated  form,  in  the  results  of  a  continuous  excitation  experiment.  As  Smoluchowski 
theory  is  inadequate  for  the  short  time  region,  fitting  continuous  excitation  experiment 
data  using  this  theory  will  certainly  result  in  some  errors  for  o  and  D.  Non-Markovian 
effect  seems  to  be  the  essential  cause. 

III.  THEORY 

From  the  above  discussion,  non-Markovian  effect  is  very  likely  responsible  for  the 
nondiffusional  behavior  of  the  encounter  dynamics  at  short  times,  in  this  section,  it  is 
shown  how  to  take  it  into  account  and  what  is  its  influence  on  reaction  kinetics. 
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A.  Generalized  diffusion  equation 

One  convenient  starting  point  ot  taking  non-Markovian  effect  into  account  for 
Brownian  motion  is  the  generalized  Langevin  equation  which  has  the  following  form 
for  the  velocity  of  a  free  Brownian  particle 

mv(t)  =  -m  j  ds£(t-s)v(s)  +  R(t)  ,  (6) 

O 

where  m  is  the  mass  of  the  particle,  R(t)  the  random  force  and  the  friction  kernel  £(t)  is 
also  called  the  memory  function  of  the  velocity  auto-correlation  function. 

By  assuming  a  Gaussian  process  for  the  random  force,  Adelman  has  derived,  from 
Eq.  (6),  a  generalized  diffusion  equation  for  a  free  Brownian  particle  (ref.  7) 

^f^=D(t)  V2 p (r,t)  ,  (7) 

where  D(t)  is  a  time-dependent  diffusion  coefficient  which  is  related  to  the  memory 
function  by 


D(t)  =  k-f  L-1  [z  f  (z)]-i 


(8) 


where  ks  is  Boltzmann  constant,  T  the  absolute  temperature  and  ^(z)  is  the  Laplace 
transform  of  the  memory  function 

t(2)  =  Jdte-zt^(t) 

o 

and  L-1  denotes  the  inverse  Laplace  transform. 

Before  using  the  generalized  diffusion  equation  to  treat  any  concrete  problem  of 
diffusion-controlled  reactions,  the  memory  function  has  to  be  specified  so  that  the 
time-dependent  diffusion  coefficient  can  be  then  determined.  Although  there  exist  now 
theoretical  approaches  for  calculating  memory  functions  from  microscopic  theories 
(ref.  8,9),  the  relaxation-time  approximation  is  chosen  to  keep  the  subsequent 
development  as  simple  as  possible  so  that  a  clear  insight  can  be  gained  into  the 
non-Markovian  effect  on  diffusion-controlled  reactions.  The  adopted  memory  function 
takes  the  following  form  : 

4  (t)  =  a  8  (t)  +  <t>0  e-^  .  (9) 

where  a,  <5>0  and  n  are  all  positive  quantities  (ref.  10).  If  <j>0  =  0,  the  Markovian  case  is 
recovered.  Moreover  it  is  not  difficult  to  show  that  the  Dirac  function  part  is  necessarily 
present  for  systems  containing  hard  cores.  The  approximation  of  the  memory  function 
given  in  Eq.  (9)  has  also  been  recently  used  to  study  the  non-Markovian  effect  on 
isomerization  reactions  (ref.  11). 
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From  Eqs.  (8)  and  (9),  straightforward  calculations  yield 

D(t)  =  D(1  +Ae-'/'>)  ,  (10) 


where 


A  = 


-t  <i>o 
a 


(11) 


a  i 

-r  <j>o  +  a 


i 

1  +  A 


(12) 


and  D  is  the  diffusion  constant  which  is  related  to  a,  <S>0  and  t  through  the  fluctuation- 
dissipation  theorem 


D  _  — !ii_I — 
m(a  +  x^>o) 


(13) 


The  time-dependent  diffusion  coefficient  given  in  Eq.  (10)  reduces  to  its  diffusion 
constant  in  the  limit  of  long  times.  At  short  times  the  time-dependent  part  gives  a 
positive  contribution.  So  the  transport  is  faster  at  short  times  than  that  given  by  the 
classic  Fick's  law.  It  is  this  enhanced  transport  which  is  the  origin  of  the  larger 
encounter  rate  observed  in  MD  simulation,  in  the  followings,  the  influence  of  non- 
Markovian  effect  on  reaction  kinetics  is  to  be  examined 


B.  Generalized  Smoluchowski  theory 

In  order  to  take  into  account  reactivity,  we  adopt  the  same  scheme  as 
Smoluchowski,  i.e.,  using  an  absorbing  boundary  condition  at  encounter  distance.  But 
now  the  generalized  diffusion  equation,  Eq.  (7),  will  be  solved  instead  of  the  usual 
diffusion  equation  with  the  following  boundary  and  initial  conditions 


p  ( r  =  a,  t )  =  0 


(14i) 


and 


p  ( r  — >o»,  t )  =  1 

f  o 

p  ( r,  t=0 )  =  j 


r  <  a 
r  >  a 


(14ii) 

(15) 


Although  Eq.  (7)  has  a  nonconstant  diffusion  coefficient,  we  have  been  still  able  to 
find  the  analytic  solution  under  the  above  conditions. 
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P  (r,  t)  =  1  -  -erfc 


r  -  c 


/t 


j  ds  D(s) 


|1/2 


(16) 


where  erfc(x)  is  the  error  function.  The  method  of  solution  is  given  in  ref.  (10). 
The  time-dependent  reaction  rate  coefficient  is  then  given  by 


k  (t)  =  4  k  a2  D(t) 


r  =  a 


=  4  n  o  D(t) 


1  + 


J  ds  D(s)ji/2 


(17) 


Integrating  Eq.  (3)  and  using  the  above  result  of  the  rate  coefficient,  we  find  the 
following  result  for  the  survival  probability 


,GS 


SA  (1)  =  exP  •!'  4  Jt  a  pb 


|  ds  D(s)  j'jds  D  (s)j1'2 


(18) 


where  the  superscript  GS  refers  to  the  generalized  Smoluchowski  theory. 

It  is  worthwhile  to  note  that  the  above  result  can  be  recasted  in  a  very  interesting 
form : 

SAS(t)=S^/dsO(s)j  ,  (19) 

where  SA(t)  is  Smoluchowski's  result  for  the  survival  probability,  given  in  Eq.(4).  When 

t 

its  argument  is  replaced  by  j  ds  D(s)/D,  we  obtain  the  result  of  the  generalized 

0 

Smoluchowski  theory  which  includes  the  non-Markovian  effect. 

Now  it  can  be  easily  shown  that  the  deviation  of  s|(t)  from  SAD  (t)  can  be  essentially 

accounted  for  by  non-Markovian  effect.  Using  the  relaxation-time  approximation  for  the 
memory  function,  we  obtain 

sf(\)  =  sl(\  +  vA(1-e-^))  .  (20) 

pc  c 

As  can  be  expected,  SA  (t)  — >  S  A  (t)  when  t »  u.  But  at  short  times,  it  can  be  easily 
OS  s 

seen  that  Sa  (t)  <  SA(t).  These  are  qualitatively  in  agreement  with  the  simulation 

result  (see  Fig.  1 .).  Thus  we  ascribe  the  enhanced  reaction  rate  at  short  times  found  in 
MD  simulation  to  non-Markovian  effect. 


IV.  CONCLUSIONS  AND  PROSPECTS 

The  results  of  our  recent  effort  in  the  study  of  diffusion-controlled  reactions  are 
summarized.  A  full  molecular  dynamics  simulation  has  been  carried  out,  in  which  the 
solvent  is  treated  on  the  equal  footing  of  reactants  rather  than  described  by  a 
continuum  as  in  traditional  approaches,  e.g.,  Smoluchowski  theory.  It  is  found  that  the 
reactive  encounter  dynamics  is  not  diffusional  at  short  times,  i.e.,  cannot  be 
adequately  described  by  the  classic  diffusion  equation  approach.  The  demarcation  of 
the  time  scales  is  essentially  characterized  by  the  duration  of  dynamic  correlations. 
The  nondiffusional  dynamics  gives  an  enhanced  effect  on  the  reaction  rate.  The 
reaction  rate  found  in  MD  simulation  can  be  about  two  times  that  given  by 
Smoluchowski  theory  at  short  times.  Also  this  enhanced  nondiffusional  dynamics 
plays  a  key  role  in  understanding  the  experimental  fact  that  the  Stern-Volmer  plot 
measured  by  a  continuous  excitation  experiment  is  always  above  that  obtained  by 
integrating  the  time-dependent  fluorescence  decay  fitted  by  using  Smoluchowski 
theory. 

Dynamic  correlations  mean  that  the  Markovian  description  for  dynamics  is  no  more 
adequate.  So  it  is  natural  that  the  deviation  of  Smoluchowski  theory  from  the  result  of 
MD  simulation  at  short  times  is  ascribed  to  non-Markovian  effect.  To  take  into  account 
the  non-Markovian  effect,  a  new  theoretical  approach  based  on  a  generalized 
diffusion  equation  is  developed.  The  non-Markovian  effect  enters  into  the  theory 
through  a  time-dependent  diffusion  coefficient  which  is  related  to  the  memory  function. 
By  using  a  relaxation-time  approximation  for  the  memory  function,  it  can  be  shown  that 
the  motion  of  the  mutual  approach  of  reactive  particles  is  faster  than  that  given  by  the 
classic  Fick's  law.  It  is  this  enhanced  transport  due  to  dynamic  correlations  which 
accounts  for  satisfactorily  the  enhanced  reaction  rate  observed  in  MD  simulation. 
When  Smoluchowski's  absorbing  boundary  condition  is  used  for  reactivity,  the 
generalized  diffusion  equation  can  be  still  solved  analytically.  The  results  turn  out  to 
have  as  simple  mathematical  expressions  as  the  classic  Smoluchowski  theory  (see 
e.g.  Eqs.(16)-(19)  ).  So  the  new  theoretical  approach  based  on  the  generalized 
diffusion  equation  is  strongly  recommended  for  future  works  on  diffusion-controlled 
reactions,  especially  for  interpreting  experimental  data.  It  can  be  hopefully  expected 
that  more  consistent  parameters  can  be  obtained  whether  time-resolved  or  continuous 
excitation  experiment  data  are  used.  An  analysis  of  these  experimental  data  by  the 
generalized  Smoluchowski  theory  is  planned  by  our  group. 

In  the  last  years,  the  influence  of  non-Markovian  effect  on  reaction  dynamics  in 
condensed  phase  has  attracted  much  interest  (ref.  12).  However,  theoretical  effort  has 
been  exclusively  focused  on  the  barrier  crossing  dynamics  of  activation-controlled 
reactions.  The  results  presented  in  this  paper  shows  that  the  non-Markovian  effect  is 
also  important  for  the  motion  of  mutual  approach  of  reactants  in  diffusion-controlled 
reactions.  The  results  presented  in  this  paper  apply  to  very  simple  diffusion-controlled 
reactions.  For  many  other  diffusion-controlled  reactions,  there  appear  some 
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complications,  e.g.,  for  reactions  between  charged  species,  Coulombic  interactions 
have  to  be  considered  and  intramicellar  reactions  occur  in  a  limited  space.  We  believe 
that  the  non-Markovian  effect  plays  certainly  also  some  role  in  these  diffusion- 
controlled  reactions.  The  theoretical  approach  presented  in  Sec.  III.  can  be  extended 
in  a  straightforward  way  to  these  problems  and  the  results  will  have  mathematical 
expressions  not  more  complicated  than  their  counterparts  in  the  case  without 
non-Markovian  effect. 

When  a  reaction  is  only  partially  controlled  by  diffusion,  the  reactivity  is  traditionally 
treated  by  the  radiation  boundary  condition  (ref.  13).  We  have  made  some  preliminary 
attempts  to  solve  the  generalized  diffusion  equation  with  this  boundary  condition.  But 
no  analytic  solution  has  yet  been  found.  Nevertherless  this  problem  deserves  well 
further  effort  in  expecting  to  obtain  at  least  approximate  analytic  solution  or  upper  and 
lower  bounds  of  the  reaction  rate  by  using  variational  principle. 


REFERENCES 


1. 

2. 

3. 

4. 

5. 

6. 

7. 

8. 

9. 

10. 
11. 

12. 

13. 


M.V.  Smoluchowski,  Z.  Phys.  Chem.,  92, 129  (1917). 

S. A.  Rice,  Diffusion-Limited  Reactions,  in  Comprehensive  Chemical  Kinetics 
Vol.  25,  ed.  C.H.  Bamford,  C.F.H.  Tipper  and  R.G.  Compton  (Elsevier,  1985). 

T. L.  Nemzek  and  W.R.  Ware,  J.  Chem.  Phys.,  62, 477  (1 975). 

J.C.  Andre,  M.  Niclause  and  W.R.  Ware,  Chem.  Phys.,  28, 371  (1978). 

W.  Dong,  F.  Bares  and  J.C.  Andre,  J.  Chem.  Phys.  (in  press). 

B.J.  Alder  and  T.E.  Wainwright,  J.  Chem.  Phys.,  31, 459  (1959). 

ibid.,  33,  1439  (1960). 

S.A.  Adelman,  J.  Chem.  Phys.,  64,  124  (1976). 

J.P.  Hansen  and  I.R.  McDonald,  Theory  of  Simple  Liquids  (2nd  edition, 
Academic  Press,  1986). 

G.F.  Mazenko  and  S.  Yip,  in  Statistical  Mechanics,  Part  B,  Ed.  B.J.  Berne 
(Plenum  Press,  1977). 

W.  Dong,  F.  Baros  and  J.C.  Andre,  Ber.  Bunsenges.  Phys.  Chem.  (to  appear). 
S.B.  Zhu,  J.  Lee,  G.W.  Robinson  and  S.H.  Lin,  J.  Chem.  Phys.,  90,  6335  ;  6340 
(1989). 

A.  Nitzan,  Adv.  Chem.  Phys.,  70,  489  (1988). 

F.C.  Collins  and  G.E.  Kimball,  J.  Colloid  Sci.,  4,  425  (1949). 


427 


te*'- '' 
' 

I; 


c'  „  ' 


Modelling  of  Molecular  Structures  and  Properties.  Proceedings  of  an  International  Meeting, 
Nancy,  France,  11-15  September  1989,  J.-L.  Rivaii  (Ed.) 

Studies  in  Physical  and  Theoretical  Chemistry,  Volume  71,  pages  427-461 
©  1990  Elsevier  Science  Publishers  B.V.,  Amsterdam  —  Printed  in  The  Netherlands 


MOLECULAR  DYNAMICS  :  APPLICATIONS  TO  PROTEINS 
M.  KARPLUS 

Department  of  Chemistry,  Harvard  University,  Cambridge,  Massachusetts  02138, 
U.S.A. 


INTRODUCTION 

Molecular  dynamics  of  macromolecules  of  biological  interest  began  in  1 977 
with  the  publication  of  a  paper  on  the  simulation  of  a  small  protein,  the  bovine 
pancreatic  trypsin  inhibitor  (McCammon  et  al.,  1977).  Although  the  trypsin 
inhibitor  is  rather  uninteresting  from  a  dynamical  viewpoint--its  function  is  to 
bind  to  trypsin-experipental  and  theoretical  studies  of  this  model  system--the 
"hydrogen  atom"  of  protein  dynamics-served  to  initiate  explorations  in  this  field. 

The  most  important  consequence  of  the  first  simulations  of  biomolecules 
51  was  that  they  introduced  a  conceptual  change  (Karplus  and  Me  Cammon,  1981  ; 
Brooks  et  al.,  1987).  Although  to  chemists  and  physicists  it  is  self-evident  that 
polymers  like  proteins  and  nucleic  acids  undergo  significant  fluctuations  at  room 
temperature,  the  classic  view  of  such  molecules  in  their  native  state  had  been 
static  in  character.  This  followed  from  the  dominant  role  of  high-resolution  x-ray 
’  crystallography  in  providing  structural  information  for  these  complex  systems. 
The  remarkable  detail  evident  in  crystal  structures  led  to  an  image  of 
biomolecules  with  every  atom  fixed  in  place.  D.C.  Phillips,  who  determined  the 
first  enzyme  crystal  structure,  wrote  "the  period  1965-75  may  be  described  as  the 
decade  of  the  rigid  macromolecule.  Brass  models  of  DNA  and  a  variety  of 
proteins  dominated  the  scene  and  much  of  the  thinking"  (Philipps,  1981). 
Molecular  dynamics  simulations  have  been  instrumental  in  changing  the  static 
view  of  the  structure  of  biomolecules  to  a  dynamic  picture.  It  is  now  recognized 
that  the  atoms  of  which  biopolymers  are  composed  are  in  constant  motion  at 
ordinary  temperatures.  The  x-ray  structure  of  a  protein  provides  approximate 
average  atomic  positions,  but  the  atoms  exhibit  fluid-like  motions  of  sizable 
i  amplitudes  about  these  averages.  Crystallographers  have  acceded  to  this  viewpoint 
’  and  have  come  so  far  as  to  sometimes  emphasize  the  parts  of  a  molecule  they  do 
jt  not  see  in  a  crystal  structure  as  evidence  of  motion  or  disorder  (Marquart  et  al., 

v;  1980).  Thus,  the  knowledge  of  protein  dynamics  subsumes  the  static  picture  in 
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that  use  of  the  average  positions  still  allows  discussion  of  many  aspects  of 
biomolecule  function  in  the  languagt  of  structural  chemistry.  The  recognition  of 
the  importance  of  fluctuations  opens  the  way  for  more  sophisticated  and  accurate 
interpretations. 

Simulation  studies  on  biomolecules  have  the  possibility  of  providing  the 
ultimate  detail  concerning  motional  phenomena  (Brooks  et  al.,  1987).  The 
primary  limitation  of  simulation  methods  is  that  they  are  approximate.  It  is  here 
that  experiment  plays  an  essential  role  in  validating  the  simulations  ;  that  is, 
comparisons  with  experimental  data  can  serve  to  test  the  accuracy  of  the 
calculations  and  to  provide  criteria  for  improving  the  methodology.  When 
experimental  comparisons  indicate  that  the  simulations  are  meaningful,  their 
capacity  for  providing  detailed  results  often  makes  it  possible  to  examine  specific 
aspects  of  the  atomic  motions  far  more  easily  than  by  making  measurements. 

In  what  follows,  a  brief  introduction  to  molecular  dynamics  will  be  given, 
followed  by  applications  that  illustrate  its  utility  for  increasing  our  understanding 
of  proteins,  including  enzymes,  and  for  interpreting  experiments  in  a  more 
effective  way. 

METHODOLOGY 

To  siudy  theoretically  the  dynamics  of  a  macromoiecuiar  system,  it  is 
essential  to  have  a  knowledge  of  the  potential  energy  surface,  which  gives  the 
energy  of  the  system  as  a  function  of  the  atomic  coordinates.  The  potential  energy 
can  be  used  directly  to  determine  the  relative  energies  of  the  different  possible 
structures  of  the  system  ;  the  relative  populations  of  such  structures  under 
conditions  of  thermal  equilibrium  are  given  in  terms  of  the  potential  energy  by 
the  Boltzmann  distribution  law  (McQuarrie,  1976).  The  mechanical  forces  acting 
on  the  atoms  of  the  systems  are  simply  related  to  the  first  derivatives  of  the 
potential  with  respect  to  the  atom  positions.  These  forces  can  be  used  to  calculate 
dynamical  properties  of  the  system,  e.g.,  by  solving  Newton's  equations  of  motion 
to  determine  how  the  atomic  positions  change  with  time  (McQuarrie,  1976  ; 
Hansen  and  McDonald,  1976).  From  the  second  derivatives  of  the  potential 
surface,  the  force  constants  for  small  displacements  can  be  evaluated  and  these  can 
be  used  to  find  the  normal  modes  (Levy  and  Karplus,  1979) ;  this  serves  as  the 
basis  for  an  alternative  approach  to  the  dynamics  in  the  harmonic  limit  (Levy  and 
Karplus,  1979  ;  Brooks  and  Karplus,  1983). 

Although  quantum  mechanical  calculations  can  provide  potential  surfaces 
for  small  molecules,  empirical  energy  functions  of  the  molecular  mechanics  type 
are  the  only  possible  source  of  such  information  for  proteins  and  the  surrounding 
solvent.  Since  most  of  the  motions  that  occur  at  ordinary  temperatures  leave  the 
bond  lengths  and  bond  angles  of  the  polypeptide  chains  near  their  equilibrium 
values,  which  appear  not  to  vary  significantly  throughout  the  protein  (e.g.,  the 


429 


standard  dimensions  of  the  peptide  group  first  proposed  by  Pauling  [Pauling  et 
al.,  1951]),  the  energy  function  representation  of  the  bonding  can  have  an 
accuracy  of  the  order  of  that  achieved  in  the  vibrational  analysis  of  small 
molecules.  Where  globular  proteins  differ  from  small  molecules  is  that  the 
contacts  among  nonbonded  atoms  play  an  essential  role  in  the  potential  energy  of 
the  folded  or  native  structure.  From  the  success  of  the  pioneering  conformational 
studies  of  Ramachadran  and  co-workers  (Ramachadran  et  al.,  1963)  that  made  use 
of  hardsphere  nonbonded  radii,  it  is  likely  that  relatively  simple  fucntions 
(Lennard-Jones  nonbonded  potentials  supplemented  by  electrostatic  interactions) 
can  adequately  describe  the  interactions  involved. 

The  energy  functions  used  for  proteins  are  generally  composed  of  terms 
representing  bonds,  bond  angles,  torsional  angles,  van  der  Waals  interactions  and 
electrostatic  interactions.  The  expression  used  in  the  program  CHARMM  (Brooks 
et  al.,  1983)  has  the  form  : 

E(R)  =  1/2  £  kb(b-b0)2  +1/2  £  K@(0-0O)2  +  (1) 

bonds  bond  angles 


1/2  X  [1  +  cos(n<j>-8)]  + 

torsional  angles 


I 

nb  pairs  r<8A 


c  Mil 
r6  Dr  J 


The  energy  is  a  function  of  the  Cartesian  coordinate  set,  R,  specifying  the 
positions  of  all  the  atoms  involved,  but  the  calculation  is  carried  out  by  first 
evaluating  the  internal  coordinates  for  bonds  (b),  bond  angles  (0),  dihedral  angles 
(<[>),  and  interparticle  distances  (r)  for  any  given  geometry  R,  and  using  them  to 
evaluate  the  contributions  to  Eq.  (1),  which  depend  on  the  bonding  energy 
parameters  Kbi  K0,  K$,  Lennard-Jones  parameters  A  and  C,  atomic  charges  q;, 
dielectric  constant  D,  and  geometrical  reference  values  b0,  0O,  n,  and  8.  For  most 
simulations  use  has  been  made  of  a  representation  that  replaces  aliphatic  group 
(CH3,  CH2,  CH)  by  single  extended  atoms.  Although  the  earliest  studies  employed 
the  extended  atom  representation  for  all  hydrogens,  present  calculations  treat 
hydrogen-bonding  hydrogens  explicity.  In  the  most  detailed  simulations  every 
protein  atom  (including  aliphatic  hydrogens)  and  explicit  solvent  molecules  (e.g., 
a  three-site  or  five-site  model  for  each  water  molecule)  is  included  (Brooks  et  al., 
1983). 

Given  a  potential  energy  function,  one  may  take  any  of  a  variety  of 
approaches  to  study  protein  dynamics.  The  most  detailed  information  is  provided 
by  molecular  dynamics  simulations,  in  which  one  uses  a  computer  to  solve  the 
Newtonian  equations  of  motion  for  the  atoms  of  the  protein  and  any  surrounding 
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solvent  (McCammon  et  al.,  1977  ;  McCammon  et  al,  1979  ;  van  Gunsteren  and 
Karplus,  1982).  With  currently  available  computers,  it  is  possible  to  simulate  the 
dynamics  of  small  proteins  for  periods  of  up  to  a  nanosecond.  Such  periods  are 
long  enough  to  characterize  completely  the  librations  of  small  groups  in  the 
protein  and  to  determine  the  dominant  contributions  to  the  atomic  fluctuations. 
To  study  slower  and  more  complex  processes  in  proteins,  it  is  generally  necessary 
to  use  other  than  the  straightforward  molecular  dynamics  simulation  method.  A 
variety  of  dynamical  approaches,  such  as  stochastic  dynamics  (Chandrasekhar, 
1943),  harmonic  dynamics  (Levy  and  Karplus,  1979  ;  Brooks  and  Karplus, 
1983),  and  activated  dynamics  (Northrup  et  al.,  1982),  can  be  introduced  to  study 
particular  problems  (Brooks  et  al.,  1987). 

Since  molecular  dynamics  simulations  have  been  used  most  widely  for 
studying  protein  motions,  we  briefly  describe  the  methodology.  To  begin  a 
dynamical  simulation,  one  must  have  an  initial  set  of  atomic  coordinates  and 
velocities.  These  are  usually  obtained  from  the  x-ray  coordinates  of  the  protein 
by  a  preliminary  calculation  that  serves  to  equilibrate  the  system  (Brooks  et  al., 
1983).  The  x-ray  structure  is  first  refined  using  an  energy  minimization 
algorithm  to  relieve  local  stresses  due  to  non-bonded  atomic  overlaps,  bond 
length  distorsions,  etc.  The  protein  atoms  are  then  assigned  velocities  at  random 
from  a  Maxwellian  distribution  corresponding  to  a  low  temperature,  and  a 
dynamical  simulation  is  performed  for  a  period  of  a  few  psec.  The  equilibration 
is  continued  by  alternating  new  velocity  assignments  (chosen  from  Maxwellian 
distributions  corresponding  to  successively  increased  temperatures)  with  intervals 
of  dynamical  relaxation.  The  temperature,  T,  for  this  microcanonical  ensemble  is 
measured  in  terms  of  the  mean  kinetic  energy  for  the  system  composed  of  N 
atoms  as 

1/2  £  nii<vf  >=  \  NkBT  (2) 

i=l  2 

where  trij  and  <v2;  >  are  the  mass  and  average  velocity  squared  of  the  ith  atom, 
and  kB  is  the  Boltzmann  constant.  Any  residual  overall  translational  and  rotational 
motion  for  an  isolated  protein  can  be  removed  to  simplify  analysis  of  the 
subsequent  conformational  fluctuations ;  in  a  solution  simulation,  the  protein  can 
diffuse  through  the  solvent.  The  equilibration  period  is  considered  finished  when 
no  systematic  changes  in  the  temperature  are  evident  over  a  time  of  about  10  psec 
(slow  fluctuations  could  be  confused  with  continued  relaxation  over  shorter 
intervals).  It  is  necessary  also  to  check  that  the  atomic  momenta  obey  a 
Maxwellian  distribution  and  that  different  regions  of  the  protein  have  the  same 
average  temperature.  The  actual  dynamical  simulation,  which  provides 
coordinates  and  velocities  for  all  the  atoms  as  a  function  of  time,  is  then 
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performed  by  continuing  to  integrate  the  equations  of  motion  for  the  desired  time 
period.  The  available  simulations  for  proteins  range  from  25  to  300  psec.  Several 
different  algorithms  for  integrating  the  equations  of  motion  in  Cartesian 
coordinates  are  being  used  in  protein  molecular  dynamics  calculations.  Most 
common  are  the  Gear  predictor-corrector  algorithm,  familiar  from  small 
molecule  trajectory  calculations  (McCammon  et  al.,  1979)  and  the  Verlet 
algorithm  (Verlet,  1967),  widely  used  in  statistical  mechanical  simulations  (van 
Gunsteren  and  Berendsen,  1977). 

INTERNAL  MOTIONS  AND  THE  UNDERLYING  POTENTIAL 
SURFACE 

For  native  proteins  with  a  well-defined  average  structure,  two  extreme 
models  for  the  internal  motions  have  been  considered.  In  one,  the  fluctuations  are 
assumed  to  occur  within  a  single  multidimensional  well  that  is  harmonic  or 
quasiharmonic  as  a  limiting  case  (Karplus  and  Kushick,  1981  ;  Brooks  and 
Karplus,  1983  ;  Levitt  et  al.,  1985).  The  other  model  assumes  that  there  exist 
multiple  minima  or  substates  ;  the  internal  motions  correspond  to  a  superposition 
of  oscillations  within  the  wells  and  transitions  among  them  (Austin  et  al.,  1975  ; 
Frauenfelder  et  al.,  1979  ;  Levy  et  al.,  1982  ;  Swalminathan  et  al.,  1982  ;  Brooks 
and  Karplus,  1983  ;  Debrunner  and  Frauenfelder,  1982).  Experimental  have  been 
interpreted  with  both  models,  but  it  has  proved  difficult  to  distinguish  between 
them  (Agmon  and  Hopfield,  1983  ;  Ansari  et  al.,  1985). 

To  characterize  the  protein  potential  surface  structurally  and  energetically 
(Elber  and  Karplus,  1987),  we  use  a  300  ps  molecular  dynamics  simulation  of  the 
protein  myoglobin  at  300°K  ;  details  of  the  simulation  method  have  been 
presented  (Levy  et  al.,  1985).  Myoglobin  was  chosen  for  study  because  it  has  been 
examined  experimentally  by  a  variety  of  methods  and  the  two  motional  models 
have  been  applied  to  it  (  Austin  et  al.,  1975  ;  Frauenfelder  et  al.,  1979  ;  Levy  et 
al.,  1982  ;  Agmon  and  Hopfield,  1983  ;  Bialek  and  Goldstein,  1985).  It  is  ideally 
suited  for  the  present  analysis,  because  its  well  defined  secondary  structure  (a 
series  of  a-helices  connected  by  loops)  facilitates  a  detailed  characterization  of  the 
dynamics. 

The  topography  of  the  potential  surface  underlying  the  dynamics  can  be 
explored  by  finding  the  local  energy  minima  associated  with  coordinate  sets 
sequential  in  time  (Stillinger  and  Weber,  1982  ;  Stillinger  and  Weber,  1984). 
Thirty-one  coordinate  sets  (one  every  10  psec)  were  selected  and  their  energy  was 
minimized  with  a  modified  Newton-Raphson  algorithm  suitable  for  large 
molecules  (Brooks  et  al.,  1983).  Since  the  coordinate  sets  all  corresponded  to 
different  minima,  structures  separated  by  shorter  time  periods  were  examined  to 
determine  how  long  the  trajectory  remains  in  a  given  minimum.  Seven  additional 
coordinate  sets  (one  every  0.05  psec)  were  chosen  and  their  behavior  on 
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minimization  was  examined  ;  if  two  coordinate  sets  converged,  they  corresponded 
to  the  same  minimum  ;  if  they  diverged  they  corresponded  to  different  minima. 
The  measure  for  the  distance  between  two  structures  is  their  rms  coordinate 
difference  after  superposition. 

Analysis  of  the  short  time  dynamics  demonstrates  that  convergence  occurs 
for  intervals  up  to  0.15  ±  0.05  ps.  Thus,  the  300  ps  simulation  samples  on  the 
order  of  2000  different  minimma  ;  this  is  a  sizable  number  but  it  may 
nevertheless  be  small  relative  to  the  total  (finite)  number  of  minima  available  to 
such  a  complex  system  in  the  neighborhood  of  the  native  average  structure  (that 
is,  conformations  that  are  native-like  and  significantly  populated  at  room 
temperature).  The  rms  differences  among  the  minimized  structures  reach  a 
maximum  value  of  approximately  2  A  at  about  100  psec.  Thus,  the  difference 
vector  (Rk-B,k>)i  where  RK  represents  the  coordinates  of  all  the  atoms  in  a 
native-like  conformation  K,  is  restricted  to  a  volume  bounded  by  a  radius  of  2  A. 

Comparison  of  the  energies  of  the  minimized  structures  shows  that  they 
vary  over  about  20°K  (40  cal/mole)  per  degree  of  freedom.  Since  this  difference 
in  energy  between  the  "inherent"  structures  (Stillinger  and  Weber,  1982  ; 
Stillinger  and  Weber,  1984)  is  small,  they  are  significantly  populated  at  room 
temperature.  Further,  the  large  number  of  such  structures  sampled  by  the  room 
temperature  simulation  suggests  that  the  effective  barriers  separating  them  are 
low  and  that  the  protein  is  undergoing  frequent  transitions  from  one  structure  to 
another.  The  fluctuations  within  a  well  can  be  described  by  a  harmonic  or 
quasiharmonic  model  while  the  transitions  among  the  wells  cannot.  Estimates 
based  on  the  time  development  of  the  rms  atomic  fluctuations  for  mainchain 
atoms  at  room  temperature  (Swaminathan  et  al.,  1982)  indicate  that  20  to  30 
percent  of  the  rms  fluctuations  are  contributed  by  oscillations  within  a  well  and 
70  to  80  percent  arise  from  transitions  among  wells  ;  for  sidechains  the 
contribution  from  transitions  among  the  multiple  wells  is  expected  to  be  larger. 
Since  energy  differences  among  some  of  the  wells  are  small,  molecules  may  be 
trapped  in  metastable  states  at  low  temperatures,  in  analogy  to  third  law  violations 
in  crystals  (e.g.  crystals  of  CO)  and  models  for  the  glassy  state  (Ziman,  1979  ; 
Stillinger  and  Weber,  18982  ;  Stillinger  and  Weber,  1984  ;  Toulouse,  1984  ; 
Ansari  et  al.,  1985  ;  Stein,  1985).  A  number  of  experiments  suggest  that  the 
transition  temperature  for  myoglobin  is  in  the  neighborhood  of  200°K  (Austin  et 
al.,  1975  ;  Parak  et  al.,  1982  ;  Debrunner  and  Frauenfelder,  1962  ;  Ansari  et  al., 
1985).  Because  large  scale,  collective  motions  that  involve  the  protein  surface  are 
important  in  the  fluctuations  (Swaminathan  et  al.,  1982),  it  is  likely  that  the 
observed  transition  is  due  to  the  freezing  of  the  solvent  matrix  (Swaminathan  et 
al.,  1982  ;  Parak  et  al.,  1982). 

Because  the  details  of  the  native  structure  of  a  protein  play  an  essential  role 
in  its  function,  it  is  important  to  determine  the  structural  origins  of  the 
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multiminimum  surface  obtained  from  the  dynamics  analysis.  The  general  features 
of  the  structure  (helices  and  turns)  are  preserved  throughout  the  simulation  and 
the  differences  in  position  are  widely  distributed.  The  motions  are  associated 
primarily  with  loop  displacements  or  relative  displacements  of  helices  which 
individually  behave  as  nearly  rigid  bodies.  Rearrangements  within  individual 
loops  are  the  elementary  step  in  the  transition  from  one  minimum  to  another  ; 
they  are  coupled  with  associated  helix  displacements.  Which  loop  or  turn  changes 
in  a  given  time  interval  appears  to  be  random.  Specific  loop  motions  may  be 
initiated  by  sidechain  transitions  in  the  helix  contacts,  mainchain  dihedral  angle 
transitions  of  the  loops  themselves,  or  a  combination  of  the  two.  As  the  time 
interval  between  two  structures  increases,  more  loop  transitions  have  occurred. 
At  room  temperature,  the  transition  probabilities  are  such  that  for  an  interval  100 
psec  or  longer  between  two  structures,  some  transitions  will  have  taken  place  in 
all  of  the  flexible  loop  regions.  However,  since  the  rms  differences  between 
structures  continue  to  increase  up  to  200  psec,  the  configuration  space  available  to 
the  molecule  includes  a  range  of  structures  for  the  loop  regions  that  are  not 
completely  sampled  in  a  100  psec. 

To  characterize  the  helix  motions  that  are  coupled  with  the  loop 
rearrangements,  the  internal  structural  changes  of  the  helices  were  separated 
from  their  relative  motions.  Individual  helices  and  loops  were  superimposed  and 
the  rms  differences  for  the  mainchain  calculated  for  the  set  of  structures  ;  the  rms 
difference  for  the  internal  structure  of  the  helices  is  generally  less  than  1  A. 
Corresponding  results  for  the  loop  regions  show  that  they  undergo  much  larger 
internal  structural  changes  on  the  order  of  2.5  A. 

In  analysing  the  relative  motion  of  the  helices,  it  is  of  particular  interest  to 
examine  the  behavior  of  helix  pairs  that  are  in  van  der  Waals  contact ;  these  are 
helix  pairs  A-H,  B-E,  B-G,  F-H  and  G-H  for  all  of  which  at  least  three  residues 
from  each  helix  are  interacting.  Each  helix  was  fitted  to  a  straight  line  and  the 
fluctuations  of  the  distance  between  the  helix  centers  of  mass  and  the  relative 
orientations  of  the  lines  were  compared.  The  relative  translations  found  in  this 
case  have  rms  values  of  0.3  to  0.7  A  and  the  relative  rotations  have  rms  values  of 
1  to  14°;  the  maximum  differences  are  1.3  to  2.2  A  and  5  to  39°,  respectively. 

The  dynamical  results  for  the  helix  motions  can  be  compared  with 
structural  data  from  two  sources  ;  the  first  is  derived  from  proteins  of  a  given 
sequence  in  different  environments  (e.g.,  two  different  crystal  forms,  deoxy  and 
oxy  hemoglobin  ;  Chothia  and  Lesk,  1985)  and  the  second  from  homologous 
proteins  with  different  sequences  (e.g.,  the  globins  ;  Lesk  and  Chothia,  1980). 
The  maximum  dynamical  displacements  are,  in  fact,  larger  than  those  observed  in 
different  x-ray  structures  of  a  given  protein.  The  values  are  of  the  same  order  as 
the  differences  (2  to  3  A,  15  to  30°  ;  there  are  some  larger  changes)  found  in 
comparing  a  series  of  different  globins  with  known  crystal  structures  and 
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sequence  homology  in  the  range  16  to  88  percent.  Thus,  the  range  of 
conformations  sampled  by  a  single  myoglobin  trajectory  is  similar  to  that  found 
in  the  evolutionary  variation  among  crystal  structures  of  the  globin  series.  This 
suggests  a  molecular  plasticity  which  is  likely  to  have  played  an  important  role  in 
the  evolution  of  protein  sequences. 

The  comparison  of  the  various  globin  structures  (Lesk  and  Chothia,  1980) 
suggested  that  the  range  of  helix  packings  is  achieved  primarily  by  changes  in 
sidechain  volumes  resulting  form  amino  acid  substitutions.  In  the  dynamics,  it  is 
the  correlated  motions  of  sidechains  that  are  in  contact,  plus  the  rearrangements 
of  loops,  that  make  possible  the  observed  helix  fluctuations.  Different  positions 
within  wells  and  transitions  between  wells  for  sidechains  (e.g.,  ±  60°,  180°  for 
Xj)  are  involved.  This  is  in  accord  with  the  results  of  high-resolution  x-ray 
studies  that  show  significant  disorder  in  sidechain  orientations  (Smith  et  al.,  1986; 
Kuriyan  et  al.,  1987).  Further,  correlated  dihedral  angle  changes  differentiate  the 
various  minima.  Since  more  than  one  set  of  sidechain  orientations  is  consistent 
with  a  given  set  of  helix  positions,  the  known  globin  crystal  structures  probably 
represent  only  a  small  subset  of  the  possible  local  minima. 

Myoglobin  at  normal  room  temperatures  samples  a  very  large  number  of 
different  minima  that  arise  from  the  inhomogeneity  of  the  system.  This  is 
expected  to  have  important  consequences  for  the  interpretation  of  myoglobin 
function  and,  more  generally,  for  the  functions  of  other  proteins,  including 
enzymes.  There  are  solid-like  microdomains  (the  helices),  whose  mainchain 
structure  is  relatively  rigid,  and  liquid-like  regions  (the  loops  and  the  sidechain 
clusters  at  interhelix  contacts)  that  readjust  as  the  helices  move  from  one  minium 
to  another.  Since  the  minima  have  similar  energies  myoglobin  is  expected  to  be 
glass-like  at  low  temperatures.  Freezing  in  of  the  liquid-like  regions  could  result 
in  a  transition  to  the  glassy  state  (Stein,  1985). 

ATOMIC  FLUCTUATION  AND  X-RAY  DIFFRACTION 

Since  atomic  fluctuations  are  the  basis  of  protein  dynamics,  it  is  important 
to  have  experimental  tests  of  the  accuracy  of  the  simulation  results  concerning 
them.  For  the  magnitudes  of  the  motions,  the  most  detailed  data  are  provided,  in 
principle,  by  an  analysis  of  the  Debye-Waller  or  temperature  factors  obtained  in 
crystallographic  refinements  of  x-ray  structures.  Averages  over  the  fluctuations 
can  be  obtained  also  from  incoherent  neutron  scattering  (Doster  et  al.,  1989). 

It  is  well  known  from  small-molecule  crystallography  that  the  effects  of 
thermal  motion  must  be  included  in  the  interpretation  of  the  x-ray  data  to  obtain 
accurate  structural  results.  Detailed  models  have  been  introduced  to  take  account 
of  anisotropic  and  anharmonic  options  of  the  atoms  and  theses  molecules  have 
been  applied  to  high-resolution  data  for  small  molecules  (Zucker  and  Schultz, 
1982).  In  protein  crystallography,  the  limited  data  available  relative  to  the  large 
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number  of  parameters  that  have  to  be  determined,  have  made  it  necessary  in  most 
cases  to  assume  that  the  atomic  motions  are  isotropic  and  harmonic.  Then  the 
structure  factor,  F(Q),  which  is  related  to  the  measured  intensity  by  I(Q)  = 
!f(Q)K  is  given  by 

F(Q)  =  X  fj(Q)  ei~‘<r,>  eW)<^  (3) 

j=i 

where  Q  is  the  scattering  vector,  crp*  is  the  average  position  of  atom  j  with 
atomic  scattering  factor  fj(Q)  and  the  sum  is  over  the  N  atoms  in  the  asymmetric 
unit  of  the  crystal.  The  Debye-Waller  factor,  Wj(Q),  is  defined  by 

Wj(Q)  =  --n2  <  Ar2  >s2  =  -BjS2  (4) 

where  s  =  I Q I  /47t.  The  quantity  Bj  is  usually  referred  to  as  the  temperature 
factor,  which  is  directly  related  to  the  mean-square  atomic  fluctuations  in  the 
isotropic  harmonic  model.  More  generally,  if  the  motion  is  harmonic  but 
anisotropic,  a  set  of  six  parameters 

Bjx  =  <  Ax2  >,  B-y  =  <  AxjAyj  >,...  Bj  =  <Az- >)  (5) 

is  required  to  fully  characterize  the  atomic  motion.  Although  in  the  earlier  x-ray 
studies  of  proteins,  the  significance  of  the  temperature  factors  was  ignored 
(presumably  because  the  data  were  not  at  a  sufficient  level  of  resolution  and 
accuracy),  more  recently  attempts  have  been  made  to  relate  the  observed 
temperature  factors  to  the  atomic  motions  (Frauenfelder  et  al.,  1979  ;  Artymiuk 
et  al.,  1979). 

In  principle,  the  temperature  factors  provide  a  very  detailed  measure  of 
these  motions  because  information  is  available  for  the  mean-square  fluctuation  of 
every  heavy  atom.  In  practice,  there  are  two  types  of  difficulties  in  relating  the  B 
factors  obtained  from  protein  refinements  to  the  atomic  motions.  The  first  is  that, 
in  addition  to  thermal  fluctuations,  any  static  (lattice)  disorder  in  the  crystal 
contributes  to  the  B  factors  ;  i.e.,  since  a  crystal  is  made  up  of  many  unit  cells, 
different  molecular  geometries  in  the  various  cells  have  the  same  effect  on  the 
average  electron  density,  and  therefore  the  B  factor,  as  atomic  motions.  For  the 
iron  atom  of  myoglobin  there  has  been  an  experimental  attempt  to  determine  the 
disorder  contribution  (Hartmann  et  al.,  1982).  Since  the  Mossbauer  effect  is  not 
altered  by  static  disorder  (i.e.,  each  nucleus  absorbs  independently),  but  does 
depend  on  atomic  motions,  comparisons  of  Mossbauer  and  x-ray  data  have  been 
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used  to  estimate  a  disorder  contribution  for  the  iron  atom  ;  the  value  obtained  is 
<  Arpe  >  =  0.08 A2 

Although  the  value  is  only  approximate,  it  nevertheless  indicates  that  the  observed 
B  factors  (e.g.,  on  the  order  of  0.44  A2  for  backbone  atoms  and  0.50  A2  for 
sidechain  atoms)  are  dominated  by  the  motional  contribution.  Most  experimental 
B  factor  values  are  compared  directly  with  the  molecular  dynamics  results  (i.e., 
neglecting  the  disorder  contribution)  or  are  rescaled  by  a  constant  amount  (e.g., 
by  setting  the  smallest  observed  B  factor  to  zero)  on  the  assumption  that  the 
disorder  contribution  is  the  same  for  all  atoms  (Petsko  and  Ringe,  1984).  The 
second  difficulty  is  that,  since  simulations  have  shown  that  the  atomic  fluctuations 
are  highly  anisotropic  and,  in  some  cases,  anharmonic,  there  may  be  significant 
errors  in  the  refinement  due  to  the  assumption  of  isotropic  and  harmonic  motion. 
A  direct  experimental  estimate  of  the  errors  is  difficult  because  sufficient  data  are 
not  yet  available  for  protein  crystals,  although  incoherent  neutron  scattering  can 
provide  information  independent  of  static  disorder  (Doster  et  al.,  1989). 
Moreover,  any  data  set  includes  other  errors  which  would  obscure  the  analysis. 
As  an  alternative  to  an  experimental  analysis  of  the  errors  in  the  refinement  of 
proteins,  a  purely  theoretical  approach  can  be  used  (Kuriyan  et  al.,  1986).  The 
basic  idea  is  to  generate  x-ray  data  from  a  molecular  dynamics  simulation  of  a 
protein  and  to  use  these  data  in  a  standard  refinement  procedure.  The  error  in  the 
analysis  can  then  be  determined  by  comparing  the  refined  x-ray  structure  and 
temperature  factors  with  the  average  structure  and  the  mean-square  fluctuations 
from  the  simulation.  Such  a  comparison,  in  which  no  real  experimental  results 
are  used,  avoids  problems  due  to  inaccuracies  in  the  measured  data  (exact 
calculated  intensities  are  used),  to  crystal  disorder  (there  is  none  in  the  model), 
and  to  approximations  in  the  simulation  (the  simulation  is  exact  for  this  case).  The 
only  question  about  such  a  comparison  is  whether  the  atomic  motions  found  in  the 
simulation  are  a  meaningful  representation  of  those  occurring  in  proteins.  As  has 
been  shown  (Petsko  and  Ringe,  1984  ;  Karplus  and  McCammon,  1983),  molecular 
dynamics  provides  a  reasonable  picture  of  the  motions  in  spite  of  errors  in  the 
potentials,  the  neglect  of  the  crystal  environment,  and  the  finite-time  classical 
trajectories  used  to  obtain  the  results.  However,  these  inaccuracies  do  not  affect 
the  exactitude  of  the  computer  "experiment"  for  testing  the  refinement  procedure 
that  is  described  below. 

In  this  study  (Kuriyan  et  al.,  1986),  a  25-p^ec  molecular  dynamics 
trajectory  for  myoglobin  was  used  (Levy  et  al.,  1985).  The  average  structure  and 
the  mean-square  fluctuations  from  that  structure  were  calculated  directly  from 
the  trajectory.  To  obtain  the  average  electron  density,  appropriate  atomic  electron 
distributions  were  assigned  to  the  individual  atoms  and  the  results  for  each 
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coordinate  set  were  averaged  over  the  trajectory.  Given  the  symmetry,  unit  cell 
dimensions,  and  position  of  the  myoglobin  molecule  in  the  unit  cell,  average 
structure  factors,  <F(Q)>,  and  intensities,  I(Q)  =  I  F(Q)  | 2,  were  calculated  from 
the  Fourier  transform  of  the  average  electron  density,  <p(r))>,  as  a  function  of 
position  r  in  the  unit  cell.  Data  were  generated  at  1.5  A  resolution,  since  this  is 
comparable  to  the  resolution  of  the  best  x-ray  data  currently  available  for 
proteins  the  size  of  myoglobin  (Kuriyan  et  al.,  1987).  The  resulting  intensities  at 
Bragg  reciprocal  lattice  points  were  used  as  input  data  for  the  widely  applied 
crystallographic  program,  PROLSQ  (Konnert  and  Hendrickson,  1980).  The 
time-averaged  atomic  positions,  obtained  from  the  simulation  and  a  uniform 
temperature  factor  provide  the  initial  model  for  refinement.  The  positions  and  an 
isotropic,  harmonic  temperature  factor  for  each  atom  were  then  refined 
iteratively  against  the  computer-generated  intensities  in  the  standard  way. 
Differences  between  the  refined  results  for  the  average  atomic  positions  and  their 
mean-square  fluctuations  and  those  obtained  from  the  molecular  dynamics 
trajectory  are  due  to  errors  introduced  by  the  refinement  procedure. 

The  overall  rms  error  in  atomic  positions  ranged  from  0.24  A  to  0.29  A 
for  slightly  different  restrained  and  unrestrained  refinement  procedures  (Kuriyan 
et  al.,  1986).  The  errors  in  backbone  positions  (0.10  -  0.20  A)  are  generally  less 
than  those  for  sidechain  atoms  (0.28  -  0.33  A)  ;  the  largest  positional  errors  are 
on  the  order  of  0.6  A.  The  backbone  errors,  although  small,  are  comparable  to 
the  rms  deviation  of  0.21  A  between  the  positions  of  the  backbone  atoms  in  the 
refined  experimental  structures  of  oxymyoglobin  and  carboxymyoglobin 
(Kuriyan  et  al.,  1987  ;  Phillips,  1980).  Further,  the  positional  errors  are  not 
uniform  over  the  whole  structure.  There  is  a  strong  correlation  between  the 
positional  error  and  the  magnitude  of  the  mean-square  fluctuation  for  an  atom, 
with  certain  regions  of  the  protein,  such  as  loops  and  external  sidechains,  having 
the  largest  errors. 

The  refined  mean-square  fluctuations  are  systematically  smaller  than  the 
fluctuations  calculated  directly  from  the  simulation.  The  magnitudes  and  variation 
of  temperature  factors  along  the  backbone  are  relatively  well  reproduced,  but  the 
refined  sidechain  fluctuations  are  almost  always  significantly  smaller  than  the 
actual  values.  The  average  backbone  B  factors  from  different  refinements  are  in 
the  range  1 1.3  to  1 1.7  A2,  as  compared  with  the  exact  value  of  12.4  A2  ;  for  the 
sidechains,  the  refinements  yield  16.5  to  17.6  A2,  relative  to  the  exact  value  of 
26.8  A2.  Regions  of  the  protein  that  have  high  mobility  have  large  errors  in 
temperature  factors  as  well  as  in  positions.  Examination  of  all  atoms  shows  that 
fluctuations  greater  than  about  0.75  A2  (B  =  20  A2)  are  almost  always 
underestimated  by  the  refinement.  Moreover,  while  actual  mean-square  atomic 
fluctuations  have  values  as  large  as  5  A2,  the  x-ray  refinement  leads  to  an 
effective  upper  limit  of  about  2  A2.  This  arises  from  the  fact  that  most  of  the 
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atoms  with  large  fluctuations  have  multiple  conformations  and  that  the  refinement 
procedure  picks  out  one  of  them. 

To  do  refinements  that  take  some  account  of  anisotropic  motions  for  all  but 
the  smallest  proteins,  it  has  been  necessary  to  introduce  assumptions  concerning 
the  nature  of  the  anisotropy.  One  possibility  is  to  assume  anisotropic  rigid  body 
motions  for  sidechains  such  as  tryptophan  and  phenylalanine  (Artymiuk  et  al., 
1979;  Glover  et  al.,  1983).  An  alternative  is  to  introduce  a  "dictionary"  in  which 
the  orientation  of  the  anisotropy  tensor  is  related  to  the  stereochemistry  around 
each  atom  (Konnert  and  Hendrickson,  1980)  ;  this  reduces  the  six  independent 
parameters  of  the  anisotropic  temperature  factor  tensor  Bj  to  three  parameters 
per  atom.  An  analysis  of  a  simulation  for  BPTI  (Yu  et  al.,  1985)  has  shown  that 
the  actual  anisotropies  in  the  atomic  motions  are  generally  not  simply  related  to 
the  local  stereochemistry  ;  an  exception  is  the  mainchain  carbonyl  oxygen,  which 
has  its  largest  motion  perpendicular  to  the  C=0  bond.  Thus,  use  of  stereochemical 
assumptions  in  the  refinement  can  yield  incorrectly  oriented  anisotropy  tensors 
and  significantly  reduced  values  for  the  anisotropies.  The  large-scale  motions  of 
atoms  are  collective  and  sidechains  tend  to  move  as  a  unit  so  that  the  directions  of 
largest  motion  are  not  related  to  the  local  bond  direction  and  have  similar 
orientations  in  the  different  atoms  forming  a  group  that  is  undergoing  correlated 
motions.  Consequently,  it  is  necessary  to  use  the  full  anisotropy  tensor  to  obtain 
meaningful  results.  This  is  possible  with  proteins  that  are  particularly  well 
ordered,  so  that  the  diffraction  data  extend  to  better  than  1  A  resolution. 

X-RAY  REFINEMENT  BY  SIMULATED  ANNEALING 

Crystallographic  structure  determinations  by  x-ray  or  neutron  diffraction 
generally  proceed  in  two  stages.  First,  the  phases  of  the  measured  reflections  are 
estimated  and  a  low-  to  medium-resolution  model  of  the  protein  is  constructed 
and  second,  more  precise  information  about  the  structure  is  obtained  by  refining 
the  parameters  of  the  molecular  model  against  the  crystallographic  data  (Wyckoff 
et  al.,  1985).  The  refinement  is  performed  by  minimizing  the  crystallographic  R 
factor,  which  is  defined  as  the  difference  between  the  observed  ( I  Fobs(h,k,l)  I ) 
and  calculated  ( I  Fca|c(h,k,l)  I )  structure  factor  amplitudes, 

R=X  ||Fobs(h,k,l)|-|Fcalc(h,k,l)  ||/  1  |Fobs(h,k,l)|  (6) 

li.k.l  h.k.l 

where  h,k,l  are  the  reciprocal  lattice  points  of  the  crystal. 

Conventional  refinement  involves  a  series  of  steps,  each  consisting  of  a  few 
cycles  of  least-squares  refinement  with  stereochemical  and  internal  packing 
constraints  or  restraints  (Sussman  et  al.,  1977  ;  Jack  and  Levitt,  1978  ;  Konnert 
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and  Hendrickson,  1980  ;  Moss  and  Morffew,  1982)  that  are  followed  by  manual 
rebuilding  of  the  model  structure  by  use  of  interactive  computer  graphics. 
Finally,  solvent  molecules  are  included  and  alternative  conformations  for  some 
protein  atoms  may  be  introduced.  The  standard  refinement  procedure  is  time 
consuming,  because  the  limited  radius  of  convergence  of  least-squares  algorithms 
(approximately  1  A)  necessitates  the  periodic  examination  of  electron  density 
maps  computed  with  various  combinations  of  Fobs  and  Fcajc  as  amplitudes,  and 
with  phases  calculated  from  the  model  structure.  Also,  the  least-squares 
refinement  process  is  easily  trapped  in  a  local  minimum  so  that  human 
intervention  is  necessary. 

Simulated  annealing  (Kirkpatrick  et  al.,  1983),  which  makes  use  of  Monte 
Carlo  or  molecular  dynamics  (Briinger  et  al.,  1987a)  simulations  to  explore  the 
conformational  space  of  the  molecule  can  help  to  overcome  the  local-minimum 
problem.  This  has  been  demonstrated  in  the  application  of  molecular  dynamics  to 
structure  determination  with  nuclear  magnetic  resonance  (NMR)  data.  In  contrast 
to  the  NMR  application  (Briinger  el  al.,  1986),  the  initial  model  for  crystal¬ 
lographic  refinement  cannot  be  arbitrary.  It  has  to  be  relatively  close  to  the 
correct  geometry  to  provide  an  adequate  approximation  to  the  phases  of  the 
structure  factors. 

To  employ  molecular  dynamics  in  crystallographic  refinement,  an  effective 
potential 

Esf=S  I  [| Fobs ( h , k ,  1 )  1  - 1 Fcalc  (h,k,l)j]2  (7) 

h,k,l 

was  added  to  the  empirical  energy  potential  given  in  Eq.  1 .  The  effective  potential 
Esf  describes  the  differences  between  the  observed  structure  factor  amplitudes  and 
those  calculated  from  the  atomic  model  ;  it  is  identical  to  the  function  used  in 
standard  least-squares  refinement  methods  (Jack  and  Levitt,  1978).  The  scale 
factor  S  was  chosen  to  make  the  gradient  of  ESf  comparable  in  magnitude  to  the 
gradient  of  the  empirical  energy  potential  of  a  molecular  dynamics  simulation 
with  S  set  to  zero. 

As  in  the  case  of  the  NMR  analysis,  simulated  annealing  refinement  was 
also  tested  on  crambin,  for  which  high-resolution  x-ray  diffraction  data  and  a 
refined  structure,  determined  by  resolved  anomalous  phasing  and  conventional 
least-squares  refinement  with  model-building,  are  available  (Hendrickson  and 
Teeter,  1981).  The  initial  structure  for  the  MD-refinement  was  obtained  from  the 
NMR  structure  determination  (see  Sect.  IV) ;  the  orientation  and  position  of  the 
NMR-derived  crambin  molecule  in  the  unit  cell  was  determined  by  molecular 
replacement  (Briinger  et  al.,  1987b).  The  root-mean-square  (rms)  differences  for 
residue  positions  of  this  initial  structure  and  the  final  manually  refined  structure 
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(Hendrickson  and  Teeter,  1981)  are  as  large  as  3.5  A,  with  particularly  large 
differences  for  residues  34  to  40  ;  the  R  factor  of  the  initial  structure  was  0.56  at 
2  A  resolution.  MD-refinement  at  3000  K  starting  with  4  A  resolution  data  for 
2.5  ps,  extending  to  3  A  resolution  for  2.5  ps,  and  finally  to  2  A  resolution  for  5 
ps,  followed  by  several  cycles  of  minimization,  reduces  the  atomic  rms  deviations 
to  0.34  and  0.56  A  for  the  backbone  and  sidechain  atoms,  respectively.  During 
the  MD-refinement,  some  atoms  in  residues  35  to  40  moved  by  more  than  3  A. 
The  essential  point  is  that  the  refinement  of  the  crambin  structure  was  achieved 
starting  from  the  initial  NMR-structure  without  human  intervention.  The  R  factor 
(0.294)  of  the  MD-refined  structure  is  somewhat  higher  than  the  R  factor  (0.258) 
of  the  manually  refined  structure  without  solvent  and  with  constant  temperature 
factors  ;  minor  model-building  would  correct  this  difference.  Other  annealing 
protocols  using  higher  temperatures  (e.g.,  7000  to  9000°K)  yield  structures  that 
are  still  closer  to  the  manually  refined  structure.  The  refinement  required 
approximately  one  hour  of  central  processing  unit  (CPU)  time  on  CRAY-1  ; 
structure  factor  calculations  accounted  for  about  half  this  time.  The  latter  portion 
of  the  calculation  has  been  considerably  leduced  in  time  by  use  of  Fast  Fourier 
Transform  (FFT)  methods  (Briinger,  1989). 

As  a  control,  the  initial  NMR-derived  structure  was  refined  without 
rebuilding  by  a  restrained  least-squares  method  (Konnert  and  Hendrickson, 
1980),  starting  at  4  A  resolution  and  then  increasing  the  resolution  to  3  A,  and 
finally  to  2  A.  The  R  factor  dropped  to  0.381,  but  the  very  bad  stereochemistry 
and  large  deviation  from  the  manually  refined  structure  indicate  that  this 
structure  has  not  converged  to  the  correct  result  ;  residues  34  to  40  have  not 
moved  and  substantial  model-building  would  be  required  to  correct  the  structure. 
Thus,  restrained  least-squares  refinement  in  the  absence  of  model-building  did  not 
produce  the  large  conformational  changes  that  occurred  in  MD-refinement  by 
simulated  annealing.  With  a  version  of  the  molecular  dynamics  program 
CHARMM  optimized  for  x-ray  refinement  (the  program  X-PLOR,  Briinger  et 
al.,  1989),  many  applications  of  simulated  annealing  have  been  made  and  shown 
to  be  of  considerable  utility  in  decreasing  the  human  effort  involved  (e.g.,  Navia 
et  al.,  1989). 

USE  OF  NUCLEAR  MAGNETIC  RESONANCE  DATA  FOR 
DYNAMICS  AND  STRUCTURE 

Nuclear  magnetic  resonance  (NMR)  is  an  experimental  technique  that  has 
played  an  essential  role  in  the  analysis  of  the  internal  motions  of  proteins 
(Campbell  et  al.,  1978  ;  Gurd  and  Rothgeb,  1979  ;  Dobson  and  Karplus,  1986). 
Like  x-ray  diffraction,  it  can  provide  information  about  individual  atoms  ;  unlike 
x-ray  diffraction,  NMR  is  sensitive  not  only  to  the  magnitude  but  also  to  the  time 
scales  of  the  motions.  Most  nuclear  relaxation  processes  are  dependent  on  atomic 
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motions  on  the  nanosecond  to  picosecond  time  scale.  Although  molecular 
tumbling  is  generally  the  dominant  relaxation  mechanism  for  proteins  in  solution, 
internal  motions  contribute  as  well  ;  for  solids,  the  internal  motions  are  of 
primary  importance.  In  addition,  NMR  parameters,  such  as  nuclear  spin-spin 
coupling  constants  and  chemical  shifts,  depend  on  the  protein  environment.  In 
many  cases  different  local  conformations  exist  but  the  interconversion  is  rapid  on 
the  NMR  time  scale,  here  on  the  order  of  milliseconds,  so  that  average  values  are 
observed.  When  the  interconversion  time  is  on  the  order  of  the  NMR  time  scale 
or  slower,  the  transition  rates  can  be  studied  by  NMR  ;  an  example  is  provided  by 
the  reorientation  of  aromatic  rings  (Campbell  et  al.,  1976  ;  Brooks  et  al.,  1987). 

In  addition  to  supplying  data  on  the  dynamics  of  proteins,  NMR  can  also  be 
used  to  obtain  structural  information.  With  recent  advances  in  techniques  it  is 
now  possible  to  obtain  a  large  number  of  approximate  interproton  distances  for 
proteins  by  the  use  of  nuclear  Overhauser  effect  measurements  (Noggle  and 
Schirmer,  1971).  If  the  protein  is  relatively  small  and  has  a  well  resolved 
spectrum,  a  large  portion  of  the  protons  can  be  assigned  and  several  hundred 
distances  for  these  protons  can  be  determined  by  the  use  of  two-dimensional  NMR 
techniques  (Wagner  and  Wiithrich,  1982).  Clearly,  these  distances  can  serve  to 
provide  structural  information  for  proteins,  analogous  to  their  earlier  use  for 
organic  molecules  (Noggles  and  Schirmer,  1971  ;  Honig  et  al.,  1971).  Of  great 
interest  is  the  demonstration  that  enough  distance  information  can  be  measured  to 
determine  the  high  resolution  structure  of  a  protein  in  solution.  In  the  last  few 
years  it  has  been  shown  how  such  NMR  structures  can  serve  to  supplement  results 
from  x-ray  crystallography,  particularly  for  proteins  that  are  difficult  to 
crystallize  (Wiithrich,  1989). 

In  what  follows  we  consider  two  questions  related  to  structure 
determination.  The  first  concerns  the  effect  of  motional  averaging  on  the 
accuracy  of  the  apparent  distances  obtained  from  the  NOE  studies  and  the  second, 
the  use  of  molecular  dynamics  simulated  annealing  to  obtain  structural  results 
from  the  NOE  data. 

For  spin-lattice  relaxation,  such  as  observed  in  nuclear  Overhauser  effect 
measurements,  it  is  possible  to  express  the  behavior  of  the  magnetization  of  the 
nuclei  being  studied  by  the  equation  (Olejniczak  et  al.,  1984  ;  Solomon,  1955). 

d(l,(‘>~l°>i  =  -Pi  (1,(0-Ui  Cij(l,.(0-Io)j  (8) 

i*j 

where  Iz(t);  and  I0;  are  the  z  components  of  the  magnetization  of  nucleus  i  at  time 
t  and  at  equilibrium,  p;  is  the  direct  relaxation  rate  of  nucleus  i,  and  Oy  is  the 
cross  relaxation  rate  between  nuclei  i  and  j.  The  quantities  p;  and  Cy  can  be 

i 
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expressed  in  terms  of  spectral  densities 

Pi  =^7iTjh2  I  [l/3Jij«0i  -fi>j)  +  Jij  (©i)  +  2Jy  (coj -oj)] 
5  >*j 

<*ij  =~7?Yjh2  [2JS  ((Oj-tOj)  -  l/3Jij  (coj -coj)] 


(9) 


where  (0;  is  the  resonance  frequency  of  nucleus  i.  The  spectra  density  functions 
can  be  obtained  from  the  correlation  functions  for  the  relative  motions  of  the 
nuclei  with  spins  i  and  j  (Olejniczak  et  al.,  1984  ;  Levy  et  al.,  1981), 
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cos(cot)dt 


(10) 


where  Y2n  (0(t)^)(t))  are  second-order  spherical  harmonics  and  the  angular 
brackets  represent  an  ensemble  average  which  is  approximated  by  an  integral 
over  the  molecular  dynamics  trajectory.  The  quantities  ©iab(0  and  Piab(0  are  the 
polar  angles  at  time  t  of  the  intemuclear  vector  between  protons  i  and  j  with 
respect  to  the  external  magnetic  field  and  rSj  is  the  interproton  distance.  In  the 
simplest  case  of  a  rigid  molecule  undergoing  isotropic  tumbling  with  a 
correlation  time  t0  this  reduces  to  the  familiar  expression 
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The  nuclear  Overhauser  effect  corresponds  to  the  selective  enhancement  of 
a  given  resonance  by  the  irradiation  of  another  resonance  in  a  dipolar  coupled 
spin  system.  Of  particular  interest  for  obtaining  motional  and  distance 
information  are  measurements  that  provide  time-dependent  NOEs  from  which  the 
cross  relaxation  rates  Oy  (see  Eq.  9)  can  be  determined  directly  or  indirectly  by 
solving  a  set  of  coupled  equations  (Eqs.  8  and  9).  Motions  on  the  picosecond 
timescale  are  expected  to  introduce  averaging  effects  that  decrease  the 
cross-relaxation  rates  by  a  scale  factor  relative  to  the  rigid  model.  A  lysozyme 
molecular  dynamics  simulation  (Ichiye  et  al.,  1986)  has  been  used  to  calculate 
dipole  vector  correction  functions  (Olejniczak  et  al.,  1984)  for  proton  pairs  that 
have  been  studied  experimentally  (Olejniczak  et  al.,  1981  ;  Poulsen  et  al.,  1980). 
Four  proton  pairs  on  three  sidechains  (Trp  28,  lie  98  and  Met  105)  with  very 
different  motional  properties  were  examined.  Trp  28  is  quite  rigid,  lie  98  has 
significant  fluctuations,  and  Met  105  is  particularly  mobile  in  that  it  jumps  among 
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different  side-chain  conformations  during  the  simulation.  The  rank  order  of  the 
scale  factors  (order  parameters)  is  the  same  in  the  theoretical  and  experimental 
results.  However,  although  the  results  for  the  Trp  28  protons  agree  with  the 
measurements  to  within  the  experimental  error,  for  both  lie  98  and  Met  105  the 
motional  averaging  found  from  the  NOE’s  is  significantly  greater  than  the 
calculated  value.  This  suggests  that  these  residues  are  undergoing  rare  fluctuations 
involving  transitions  that  are  not  adequately  sampled  by  the  simulation. 

If  nuclear  Overhauser  effects  are  measured  between  pairs  of  protons  whose 
distance  is  not  fixed  by  the  structure  of  a  residue,  the  strong  distance-dependence 
of  the  cross-relaxation  rates  (1/r6)  can  be  used  to  obtain  estimates  of  the 
interproton  distances  (Poulsen  et  al.,  1980  ;  Olejniczak  et  al.,  1981  ;  Wagner  and 
Wiithrich,  1982  ;  Clore  et  al.,  1985).  The  simplest  application  of  this  approach  is 
to  assume  that  proteins  are  rigid  and  tumble  isotropically.  The  lysozyme 
molecular  dynamics  simulation  was  used  to  determine  whether  picosecond 
fluctuations  are  likely  to  introduce  important  errors  into  such  an  analysis 
(Olejniczak  et  al.,  1984).  The  result  shows  that  the  presence  of  the  motions  will 
cause  a  general  decrease  in  most  NOE  effects  observed  in  a  protein.  However, 
because  the  distance  depends  on  the  sixth  root  of  the  observed  NOE,  motional 
errors  of  a  factor  of  two  in  the  latter  lead  to  only  a  12%  uncertainty  in  the 
distance.  Thus,  the  decrease  is  usually  too  small  to  produce  a  significant  change  in 
the  distance  estimated  from  the  measured  NOE  value.  This  is  consistent  with  the 
excellent  correlation  found  between  experimental  NOE  values  and  those 
calculated  using  distances  from  a  crystal  structure  (Poulsen  et  al.,  1980).  Specific 
NOEs  can,  however,  be  altered  by  the  internal  motions  to  such  a  degree  that  the 
effective  distances  obtained  are  considerably  different  from  those  predicted  for  a 
static  structure.  Such  possibilities  must,  therefore,  be  considered  in  any  structure 
determination  based  on  NOE  data.  This  is  true  particularly  for  cases  involving 
averaging  over  large  fluctuations,  such  as  may  occur  for  external  sidechains  and 
mobile  loop  regions. 

Because  of  the  inverse  sixth  power  of  the  NOE  distance  dependence 
experimental  data  so  far  are  limited  to  protons  that  are  separated  by  less  than  5  A. 
Thus,  the  information  required  for  a  direct  protein  structure  determination  is  not 
available.  To  overcome  this  limitation  it  is  possible  to  introduce  additional 
information  provided  by  empirical  energy  functions  (Brook  et  al.,  1983).  One 
way  of  proceeding  is  to  do  molecular  dynamics  simulated  annealing  with  the 
approximate  interproton  distances  introduced  as  restraints  in  the  form  of  skewed 
biharmonic  potentials  (Clore  et  al.,  1985  ;  Briinger  et  al.,  1986)  ;  the  force 
constants  can  be  chosen  to  correspond  to  the  experimental  uncertainty  in  the 
distance. 

A  model  study  of  the  small  protein  crambin,  which  is  composed  of  46 
residues,  was  made  with  realistic  NOE  restraints  (Briinger  et  al.,  1986).  Two 
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hundred  forty  approximate  interproton  distances  less  than  4  A  were  used, 
including  184  short-range  distances  (i.e.,  those  connecting  protons  in  two  residues 
that  are  less  than  5  residues  apart  in  the  sequence)  and  56  long-range  distances. 
The  molecular  dynamics  simulations  converged  to  the  known  crambin  structure 
(Hendrickson  and  Teeter,  1981)  from  different  initial  extended  structures.  The 
average  structure  obtained  from  the  simulations  with  a  series  of  different 
protocols  had  rms  deviations  of  1 .3  A  for  the  backbone  atoms,  and  1 .9  A  for  the 
sidechain  atoms.  Individual  converged  simulations  had  rms  deviations  in  the  range 
1.5  to  2.1  A  and  2.1  to  2.8  A  for  the  backbone  and  sidechain  atoms,  respectively. 
Further,  it  was  shown  that  a  dynamics  structure  with  significantly  large  deviations 
(5.7  A)  could  be  characterized  as  incorrect,  independent  of  a  knowledge  of  the 
crystal  stucture  because  of  its  higher  energy  and  the  fact  that  the  NOE  restraints 
were  not  satisfied  within  the  limits  of  error.  The  incorrect  structure  resulted 
when  all  NOE  restraints  were  introduced  simultaneously,  rather  than  allowing 
the  dynamics  to  proceed  first  in  the  presence  of  only  the  short-range  restraints 
followed  by  introduction  of  the  long-range  restraints.  Also  of  interest  is  the  fact 
that  although  crambin  has  three  disulfide  bridges  it  was  not  necessary  to  introduce 
information  concerning  them  to  obtain  an  accurate  structure. 

The  folding  process  as  simulated  by  the  restrained  dynamics  is  very  rapid. 
At  the  end  of  the  first  2  ps  the  secondary  structure  is  essentially  established  while 
the  molecule  is  still  in  an  extended  conformation.  Some  tertiary  folding  occurs 
even  in  the  absence  of  long-range  restraints.  When  they  are  introduced,  it  takes 
about  5  ps  to  obtain  a  tertiary  stucture  that  is  approximately  correct  and  another  6 
ps  to  introduce  the  small  adjustments  required  to  converge  to  the  final  structure. 

It  is  of  interest  to  consider  the  relation  between  the  results  obtained  in  the 
restrained  dynamics  simulation  and  actual  protein  folding.  That  correctly  folded 
structures  are  achieved  only  when  the  secondary  structural  elements  are  at  least 
partly  formed  before  the  tertiary  restraints  are  introduced  is  suggestive  of  the 
diffusion-collision  model  of  protein  folding  (Bashford  et  al.,  1984).  Clearly,  the 
specific  pathway  has  no  physical  meaning  since  it  is  dominated  by  the  NOE 
restraints.  Also,  the  time  scale  of  the  simulated  folding  process  is  12  orders  of 
magnitude  faster  than  experimental  estimates.  About  6  to  9  orders  of  magnitude 
of  the  rate  increase  are  due  to  the  fact  that  the  secondary  structure  is  stable  once  it 
is  formed,  in  constrast  to  real  protein  where  the  secondary  structural  elements 
spend  only  a  small  fraction  of  time  in  the  native  conformation  until  coalescence 
has  occurred.  The  remainder  of  the  artificial  rate  increase  presumably  arises 
from  the  fact  that  the  protein  follows  a  single  fairly  direct  path  to  the  folded  state 
in  the  presence  of  the  NOE  restraints,  instead  of  having  to  go  through  a  complex 
search  process. 

Many  applications  of  NMR  data  to  structure  determinations  have  been 
made.  Both  distance  geometry  methods  and  molecular  dynamics  have  been 
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employed  for  reducing  the  data  (Wiithrich,  1989  ;  Clore  and  Gronenbom,  1989). 

STRUCTURAL  ROLE  OF  ACTIVE-SITE  WATERS  IN 
RIBONUCLEASE  A 

To  achieve  a  realistic  treatment  of  solvent-accessible  active  sites,  a 
molecular  dynamics  simulation  method,  called  the  stochastic  boundary  method, 
has  been  implemented  (Brooks  and  Karplus,  1983  ;  Briinger  et  al.,  1984  ;  Brooks 
I  et  al.,  1985).  It  makes  possible  the  simulation  of  a  localized  region,  approximately 
spherical  in  shape,  that  is  composed  of  the  active  site  with  or  without  ligands,  the 
j  essential  portions  of  the  protein  in  the  neighborhood  of  the  active  site,  and  the 

!  surrounding  solvent.  The  approach  provides  a  simple  and  convenient  method  for 

reducing  the  total  number  of  atoms  included  in  the  simulation,  while  avoiding 
>  spurious  edge  effects. 

:  The  stochastic  boundary  method  for  solvated  proteins  starts  with  a  known 

i  x-ray  structure  ;  for  the  present  problem  the  refined  high-resolution  (1.5  to  2  A) 
x-ray  structures  provided  by  Petsko  and  coworkers  was  used  (G.  Petsko,  private 
communication).  The  region  of  interest  (here  the  active  site  of  ribonuclease  A) 
was  defined  by  choosing  a  reference  point  (which  was  taken  at  the  position  of  the 
phosphorus  atom  in  the  CpA  inhibitor  complex)  and  constructing  a  sphere  of  12 
A  radius  around  this  point.  Space  within  the  sphere  not  occupied  by 
i  crystallographically  determined  atoms  was  filled  by  water  molecules,  introduced 
from  an  equilibrated  sample  of  liquid  water.  The  12- A  sphere  was  further 
j  subdivided  into  a  reaction  region  (10  A  radius)  treated  by  full  molecular 
dynamics  and  a  buffer  region  (the  volume  between  10  and  12  A)  treated  by 
•  Langevin  dynamics,  in  which  Newton's  equations  of  motion  for  the  nonhydrogen 
atoms  are  augmented  by  a  frictional  term  and  a  random-force  term  ;  these 
additional  terms  approximate  the  effects  of  the  neglected  parts  of  the  system  and 
permit  energy  transfer  in  and  out  of  the  reaction  region.  Water  molecules  diffuse 
}  freely  beween  the  reaction  and  buffer  regions,  but  are  prevented  from  escaping 

i  by  an  average  boudary  force  (Briinger  et  al.,  1984).  The  protein  atoms  in  the 

buffer  region  are  constrained  by  harmonic  force  derived  from  crystallographic 
temperature  factors  (Brooks  et  al.,  1985).  The  forces  on  the  atoms  and  their 
dynamics  were  calculated  with  the  CHARMM  program  (Brooks  et  al.,  1983) ;  the 
water  molecules  were  represented  by  the  ST2  model  (Stillinger  and  Rahman, 
|  1974). 

i  One  of  the  striking  aspects  of  the  active  site  of  ribonuclease  is  the  presence 

(of  a  large  number  of  positively  charged  groups,  some  of  which  may  be  involved 
in  guiding  and/or  binding  the  substrate  (Matthew  and  Richards,  1982).  The 
i  simulation  demonstrated  that  these  residues  are  stabilized  in  the  absence  of  ligands 

|  by  well-defined  water  networks.  A  particular  example  includes  Lys-7,  Lys-41, 

I  Lys-66,  Arg-39  and  the  doubly  protonated  His-119.  Bridging  waters,  some  of 
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which  are  organized  into  trigonal  bipyramidal  structures,  were  found  to  stabilize 
the  otherwise  very  unfavorable  configuration  of  near-neighbor  positive  groups 
because  the  interaction  energy  between  water  and  the  charged  C-NH+n  (n  =  1 , 2, 
or  3)  moieties  is  very  large  ;  e.g.,  at  a  donor-acceptor  distance  of  2.8  A,  the 
C-NH+3  -  H20  energy  is  -  19  kcal/mole  with  the  empirical  potential  used  for  the 
simulation  (Brooks  et  al.,  1983),  in  approximate  agreement  with  accurate 
quantum-mechanical  calculations  (Desmeules  and  Allen,  1980)  and  gas-phase 
ion-molecule  data  (Kebarle,  1977).  The  average  stabilization  energy  of  the 
charged  groups  (Lys-7,  Lys-41,  Lys-66,  Arg-39  and  His-119)  and  the  106  water 
molecules  included  in  the  simulation  is  -376.6  kcal/mole.  This  energy  is  calculated 
as  the  difference  between  the  simulated  system  and  a  system  composed  of  separate 
protein  and  bulk  water.  Unfavorable  protein-protein  charged-group  interactions 
are  balanced  by  favorable  water-protein  and  water-water  interactions.  The 
average  energy  per  molecule  of  pure  water  from  an  equivalent  stochastic 
boundary  simulation  (Briinger  et  al..  1984)  was  -9.0  kcal/mole,  whereas  that  of 
the  waters  included  in  the  active-site  simulation  was  -10.2  kcal/mole  ;  in  the  latter 
a  large  contribution  to  the  energy  came  from  the  interactions  between  the  water 
molecules  and  the  protein  atoms.  It  is  such  energy  differences  that  are  essential  to 
a  correct  evaluation  of  binding  equilibria  and  the  changes  introduced  by 
site-specific  mutagenesis  (Fersht  et  al.,  1985). 

During  the  simulation,  the  water  molecules  involved  in  the  charged-group 
interactions  oscillated  around  their  average  positions,  generally  without 
performing  exchange.  On  a  longer  time  scale,  it  is  expected  that  the  waters  would 
exchange  and  that  the  sidechains  would  undergo  larger  scale  displacements.  This 
is  in  accord  with  the  disorder  found  in  the  x-ray  results  for  lysine  and  arginine 
residues  (e.g.,  Lys-41  and  Arg-39)  (Gilbert  et  al.,  to  be  published  ;  Wlodawer, 
1985),  a  fact  that  makes  difficult  a  crystallographic  determination  of  the  water 
structure  in  this  case.  It  is  also  of  interest  that  Lys-7  and  Lys-41  have  an  average 
separation  of  only  4  A  in  the  simulation,  less  than  that  found  in  the  x-ray 
structure.  That  this  like -charged  pair  can  exist  in  such  a  configuration  is 
corroborated  by  experiments  that  have  shown  that  the  two  lysines  can  be 
cross-linked  (Marfey  et  al.,  1965)  ;  the  structure  of  this  compound  has  been 
reported  recently  (Weber  et  al.,  1985)  and  is  similar  to  that  found  in  the  native 
protein. 

In  addition  to  the  role  of  water  in  stabilizing  the  charged  groups  that  span 
the  active  site  and  participate  in  catalysis,  water  molecules  make  hydrogen  bonds 
to  protein  polar  groups  that  become  involved  in  ligand  binding.  A  particularly 
clear  example  is  provided  by  the  adenine-binding  site  in  the  CpA  simulation.  The 
NH2  group  of  adenine  acted  as  a  donor,  making  hydrogen  bonds  to  the  carbonyl 
of  Asn-67,  and  the  ring  NIA  of  adenine  acted  as  an  acceptor  for  a  hydrogen  bond 
from  the  amide  group  of  Glu-69.  Corresponding  hydrogen  bonds  were  present  in 


447 


the  free  ribonuclease  simulation,  with  appropriately  bound  water  molecules 
replacing  the  substrate.  These  waters  and  those  that  interact  with  the 
pyrimidine-site  residues  Thr-45  and  Ser-123  help  to  preserve  the  protein 
structure  in  the  optimal  arrangement  for  binding.  Similar  substrate  "mimicry" 
has  been  observed  in  x-ray  structures  of  lysozyme  (Blake  et  al.,  1983)  and  of 
penicillopepsin  (James  and  Sielecki,  1983),  but  has  not  yet  been  seen  in 
ribonuclease. 

A  COOPERATIVITY  MUTANT  IN  HEMOGLOBIN 

Hemoglobin  has  long  been  a  subject  of  experimental  and  theoretical  studies 
because  it  is  the  classic  example  of  cooperativity  in  biological  systems.  Since  the 
determination  of  the  x-ray  structure  of  the  unliganded  (deoxy)  and  liganded  (oxy) 
tetramers  by  Perutz  and  coworkers  (Perutz,  1970  ;  Fermi  and  Perutz,  1981), 
attention  has  been  focused  on  the  atomic  details  of  the  cooperative  mechanism.  In 
particular,  structural  and  thermodynamic  measurements  on  native,  mutant,  and 
modified  hemoglobins  have  been  utilized  in  attempts  to  isolate  the  essential  amino 
acids  and  to  determine  their  contributions  to  cooperativity.  It  has  been  suggested 
on  the  basis  of  such  studies  (Pettigrew  et  al.,  1982  ;  Perutz,  1970)  and  theoretical 
analyses  (Gelin  et  al.,  1983)  that  the  interactions  between  the  C  helix  of  one  chain 
and  the  FG  comer  region  of  another  (Cot!  -  FGB2)  play  an  important  role  in  the 
coupling  of  relative  stabilities  of  the  quaternary  structures  of  the  tetramer  to  the 
tertiary  changes  induced  in  the  allosteric  core  by  ligand  binding  to  individual 
subunits.  Asp  P99  (Gl)  is  one  of  the  residues  that  have  been  studied  in  great 
detail.  A  series  of  naturally  occurring  mutants  all  have  significantly  reduced 
cooperativity  and  increased  oxygen  affinity  relative  to  normal  hemoglobin 
(Dickerson  and  Geis,  1983  ;  Bunn  and  Forget,  1980).  From  a  comparison  of  the 
deoxy  and  oxy  normal  hemoglobin  tetramer  structures,  it  has  been  suggested  that 
the  essential  role  of  Asp  P99  (Gl)  is  to  stabilize  the  deoxy  tetramer  by  making 
hydrogen  bonds  to  Tyr  a42  (C7)  and  to  Asn  a97  (G4),  which  are  absent  in  the 
oxy  tetramer  (Morimoto  et  al.,  1971).  It  is  now  possible  to  supplement  such 
observational  conclusions  by  free  energy  simulations.  We  here  employ  the 
simulation  method  to  show  that  the  observed  free  energy  changes  result  form 
interactions  of  Asp  P99  (Gl)  with  several  amino  acids  and  with  the  solvent. 
Although  both  Tyr  a42  (C7)  and  Asn  a97  (G4)  are  found  to  play  a  significant 
role,  other  interactions  are  shown  to  be  of  equal  or  greater  importance.  In  what 
follows,  we  focus  on  the  mutant  Asn  P99  (Gl)  to  Ala  (Hb  Radcliffe)  (Weatherall 
et  al.,  1977)  which  is  of  the  "deletion"  type  (Ferscht,  1987)  and  is  therefore 
expected  to  lead  to  only  localized  structural  changes  (Shih  et  al.,  1985)  that  are 
simplest  to  interpret. 

The  free  energy  difference  AG  between  two  states  A  and  B  (here  A 
corresponds  to  normal  hemoglobin  and  B  to  a  mutant  hemoglobin)  is  obtained  by 
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thermodynamic  integration  with  the  formula  (Kirkwood,  1935  ;  Kirkwood, 
1942 ;  Fleischman  et  al.,  1990). 

AG  =  |^<AV>xdX  (12) 

where  AV  =  VB  -  VA  and  X  is  a  parameter,  such  that  V\  =  (1-X)  VA  +  XV B ;  the 
quantities  VA  and  VB  are  empirical  energy  functions  describing,  respectively,  the 
normal  and  the  mutant  hemoglobin  molecule  system.  The  essential  part  of  the 
calculation  is  the  evaluation  of  the  thermodynamic  average  <AV>^ ,  where  the 
subscript  X  implies  that  average  is  over  the  hybrid  system  described  by  V^.  For 
calculating  the  integral  (Eq.  12),  a  series  of  X  values  are  used  (Fleischman  et  al., 
1990  ;  Brooks  et  al.,  1983).  A  stochastic  boundary  simulation,  which  followed  the 
procedure  described  previously  (Briinger  et  al,.,  1985),  was  employed  to  obtain 
the  required  averages. 

To  determine  the  effect  of  the  mutation  Asp  (399  (Gl)  -»  Ala  on  coopera- 
tivity,  the  free  energy  change  of  the  deoxy  and  oxy  tretramers  resulting  from  this 
mutation  has  been  calculated  by  Eq.  (12)  (Gao  et  al.,  1989)  (see  Table  1).  Both 
the  deoxy  and  the  oxy  tetramer  are  destabilized  by  the  mutation  (66  and  60.5 
kcal/mole,  respectively  per  interface),  but  it  is  the  deoxy  tetramer  which  is  more 
destabilized,  leading  to  the  reduced  cooperativity  and  increased  ligand  affinity. 
The  differences  between  them  (5.5  kcal/mole)  can  be  compared  with  the 
measurements  of  Ackers  et  al.  (3.4  kcal/mole  ;  private  communication).  The 
experimental  and  theoretical  results  have  the  same  sign  and  are  of  the  same  order, 
suggesting  that  the  simulation  may  be  meaningfully  analyzed  to  obtain  insight  into 
the  interactions  that  contribute  to  the  free  energy  differences. 

To  analyze  the  results,  we  make  use  of  the  fact  that  due  to  the  linear  form 
of  Eq.  (12),  the  free  energy  can  be  decomposed  into  the  contribution  from 
interactions  between  the  mutated  residue  and  any  other  residues  or  water 
molecules  ;  in  all  cases,  we  consider  the  change  in  the  free  energy  induced  by  the 
mutation.  The  change  in  the  solvent  interactions,  which  are  essentially 
electrostatic,  is  more  destabilizing  for  the  oxy  than  for  the  deoxy  tetramer.  This 
is  in  accord  with  the  x-ray  structures  since  in  the  oxy  tetramer  the  Asp  sidechain 
is  more  exposed  than  in  the  deoxy  tetramer.  With  respect  to  the  protein 
interactions,  the  mutation  stabilizes  the  oxy  tetramer  and  destabilizes  the  deoxy 
tetramer.  There  are  both  inter  and  intrasubunit  contributions.  As  to  the 
intersubunit  terms,  Tyr  a42  (C7)  does  indeed  stabilize  the  deoxy  form  in  accord 
with  the  analysis  of  Morimoto  et  al.  (1971).  By  contrast,  Asn  a97  (G4)  favors  the 
oxy  form  by  a  relatively  small  amount.  The  interaction  with  Asp  a94  (Gl)  is 
unfavorable  in  both  the  deoxy  and  oxy  form  ;  i.e.,  the  free  energy  of  interaction 
between  Asp  a94  and  Asp  (399  is  destabilizing  in  both  tetramers,  so  that  the 
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Table  1 

Free  Energy  for  the  Mutation  Asp  G1(99)P  — >  Ala3 


Contribution 

AG(deoxy) 

AG  (oxy) 

AAG(oxy-deoxy) 

Solvent 

46.0 

68.5 

22.5 

Proteinb 

20.0 

-8.0 

-28.0 

Asp  Gl(99)p22 

i 

OO 

oo 

-11.0 

-2.2 

Inter  (at) 

18 

-24.4 

-27.2 

Tyr  C7(42) 
Asp  Gl(94) 

Val  G3(96) 

Asn  G4(97) 

8.4 

-22.0 

1.6 

9.7 

-4.3 

-44.4 

7.1 

13.0 

-12.7 

-22.4 

5.5 

3.3 

Intra  (p2) 

26.1 

27.4 

13 

His  FG4(97) 

-2.1 

-3.3 

-1.2 

Pro  G2(100) 

8.2 

5.4 

-2.8 

Glu  G3(10lj 

-11.2 

6.9 

18.1 

Asn  GA(102) 

14.3 

10.1 

-4.2 

TOTAL 

66.0 

60.5 

-5.5 

(a)  All  value  in  kcallmole  are  given  for  one  a^2  interface  ;  a  term  in  AG  with 
a  positive  sign  corresponds  to  the  fact  that  the  given  contribution 
destabilizes  the  mutant  (Ala)  relative  to  the  wild  type  (Asp).  When  the 
effect  of  the  Asp  residue  by  itself  is  discussed  in  the  text,  stabilizing 
contributions  have  a  positive  sign. 

(b)  Only  the  residues  which  contribute  more  than  7.5  kcalhnole  to  both  the 
deoxy  and  oxyforms  are  listed. 

(c)  Internal  energy  contribution. 


replacement  by  the  nonpolar  Ala  stabilizes  the  deoxy  tetramer.  Also  of  interest 
are  the  contributions  that  arise  from  within  the  p2  subunit,  which  are  by 
definition  the  result  of  tertiary  stuctural  changes  that  accompany  the  quaternary 
transition.  All  the  residues  involved  are  dost  m  the  mutated  residue  Asp  a99  and 
the  largest  contribution  involves  Glu  (3101.  Apparently,  the  Asp  (399  /Glu  (3101 
interaction  is  stabilizing  in  the  oxy  tetramer  and  destabilizing  in  the  deoxy 
tetramer. 
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It  is  evident  that  the  free  energy  simulations  provide  new  insights  into  the 
nature  of  the  interactions  in  proteins  and  the  possible  consequences  of  mutations. 
Although  free  energy  simulations  are  a  recent  development  in  molecular 
dynamics  so  that  their  reliability  is  not  fully  established,  it  is  likely  that  even  if 
the  quantitative  values  obtained  here  are  not  correct,  the  qualitative  insights  are 
still  of  interest.  It  is  clear  that  a  relatively  small  overall  change  in  free  energy 
may  involve  contributions  from  several  large  terms.  Also,  the  balance  between 
protein-protein  and  protein-solvent  interactions  plays  an  essential  role.  Finally, 
the  intrasubunit  contribution  to  cooperativity  has  not  been  considered  previously. 

CONCLUSION 

Molecular  dynamics  is  now  playing  an  important  role  in  the  study  of  the 
properties  of  macromolecules  of  biological  interest.  It  is  also  being  used 
effectively  in  the  analysis  of  experimental  data  and,  in  particular,  has  been  shown 
to  provide  a  new  approach  to  structure  determination  by  NMR  and  x-ray 
crystallography.  Because  molecular  dynamics  simulations  are  relatively  new  they 
have  so  far  been  employed  primarily  by  theoreticians.  It  is  to  be  hoped  that 
experimentalists,  as  well,  will  begin  to  use  molecular  dynamics  as  a  research  tool 
for  obtaining  a  deeper  understanding  of  the  biomolecules  with  which  they  work. 
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DISCUSSION 


RULLMANN  -  You  showed  how  a  protein  is  moving  in  time  through  a  large  number  ot 
substates.  Is  it  possible  to  say  anything  about  the  statistical  or  topological  distribution 
of  these  substates,  and  could  such  knowledge  be  used  to  improve  our  understanding 
of  the  behaviour  of  macromolecular  systems  ? 

KARPLUS  -  As  I  indicated  in  my  talk,  susbstates  that  are  very  similar  are  accessed  on 
a  subpicosecond  timescale.  They  differ  only  locally  (0.2  A  rms).  As  the  time  interval 
and  rms  structural  difference  increase,  more  extensive  parts  of  the  protein  are 
involved.  At  the  long-time  limit  (on  the  order  of  150  ps)  any  pair  of  substates  differ  from 
each  other  by  changes  that  are  distributed  throughout  the  protein.  As  yet  it  has  not 
been  possible  to  evaluate  quantitatively  the  relative  energies  of  a  large  enough 
population  of  substates  to  permit  a  statistical  evaluation  of  their  contributions. 


DYMEK  -  The  idea  of  using  60  CO’s  to  analyze  the  path  of  CO  from  the  heme  pocket  to 
outside  the  myoglobin  by  "turning  off"  the  CO  repulsions  is  a  good  idea.  But  doesn't  the 
presence  of  a  CO  in  the  myoglobin  affect  its  fluctuations  in  a  way  that  changes  the 
potentials  that  the  other  CO  molecules  see  ? 

KARPLUS  -  In  applying  the  time-dependent  Hartree  approximation  to 
photodissociation  of  CO  myoglobin,  the  assumption  is  made  that  the  effect  of  a  CO 
molecule  on  the  protein  fluctuations  is  small.  An  analysis  of  the  protein  in  the  presence 
of  60  CO  molecules  suggest  that  its  dynamics  are  very  similar  to  free  myoglobin 
dynamics.  It  does  not  appear  to  be  true  that  when  one  CO  goes  over  a  barrier  it  is 
easier  for  a  second  one  to  follow  in  a  correlated  fashion.  All  of  these  results  support  the 
use  of  the  time-dependent  Hartree  method  for  obtaining  CO  pathways  as  a  good  first 
approximation. 


WARSHEL  -  Basically  your  study  presented  the  different  contribution  to  the  very  large 
(-  70K  cal/mol)  energy  of  charging  an  ionizable  group  in  a  protein.  Now  while  this  is 
very  important,  earlier  (1984)  you  argued  in  response  to  a  related  calculation  that  no 
one  should  calculate  such  numbers  since  no  one  measures  these  energies. 
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KARPLUS  -  It  is  true  that  if  one  can  calculate  the  energy  difference  between  a  charged 
group  in  solution  and  the  protein  directly,  this  permits  one  to  cancel  out  the  large 
contribution  which  comes  from  solvation  of  the  gas  phase  ion.  Methods  based  on 
continuum  electrostatistics  are  useful  in  that  regard.  However,  now  that  it  is  possible  to 
make  many  mutations  in  proteins  (which  was  not  true  in  1984  when  the  discussion  to 
which  you  refer  took  place  ;  I  presume  you  are  referring  to  the  Molecular  Dynamics 
Workshop  in  North  Carolina),  it  is  more  important  to  dissect  the  various  contributions 
so  one  can  determine  what  can  happen  when  one  removes  a  charged  group  which 
can  interact  strongly  with  other  charged  groups  in  the  protein. 


GRESH  -  The  existence  of  a  favourable  interaction  between  Glu  101  and  the  mutatest 
aspartate  is  unexpected.  Could  the  dielectric  screening  be  incriminated  for  it  ? 

KARPLUS  -  Two  charged  groups  can  have  a  favorable  interaction  (in  the  present 
calculation  this  corresponds  to  a  destabilizing  contribution  when  the  aspartic  acid  is 
changed  to  alanine)  if  the  orientation  of  the  two  groups  is  appropriate.  One  must 
remember  that  one  is  not  simply  dealing  with  a  single  negative  charge,  but  a 
distribution  of  charges  which  adds  up  to  minus  one.  Since  dielectric  screening  comes 
into  the  present  calculation  indirectly  (i.e.,  due  to  the  contribution  of  solvent 
interactions),  the  results  that  I  reported  cannot  be  based  on  this. 


BUCKINGHAM  - 1  have  three  remarks  : 

1.  Your  simulations  on  the  hemoglobin  mutants  shows  that  the  significant  free  energy 
differences  emerge  as  the  differences  between  large  numbers.  How  accurate  do  you 
estimate  these  differences  to  be  ? 

2.  The  hydrogen  bond  is  known  to  be  non-additive,  and  this  is  thought  to  be  due  to 
polarization.  Do  you  incorporate  non-additive  interactions  in  your  computations  ? 

3.  One  could  test  for  cooperative  effects  in  the  interaction  of  CO  moelcules  with  the 
protein  by  doing  the  simulation  with  different  numbers  of  CO’s.  If  the  rate  of  reaction 
per  CO  is  independent  of  the  number  of  CO's,  then  the  cooperative  effect  is  negligible. 

KARPLUS  -  Your  first  question  concerns  the  accuracy  of  the  overall  free  energy 
change  which,  as  1  pointed  out,  is  a  difference  between  large  numbers.  There  is  a 
significant  uncertainty  in  the  overall  free  energy  charge  (perhaps  even  of  the 
magnitude  of  the  free  energy  difference  itself).  However,  the  qualitative  features  of  the 
large  individual  contributions  are  correctly  given  by  the  method.  The  work  which  I 
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described  is  concerned  with  demonstrating  that  there  exists  these  large  contributions, 
and  with  suggesting  that  they  may  appear  directly  when  appropriate  mutations  are 
made. 

As  to  the  nonadditivity  of  hydrogen  bons  due  to  polarization,  that  is  not  included  in  the 
force  field  we  are  using.  There  is,  however,  some  nonadditivity  of  hydrogen  bond 
interactions  with  water  in  that  the  orientation  effects  on  the  water  lead  to  a  larger 
favorable  contribution  to  the  free  energy  than  from  the  second  hydrogen  bond  than  the 
first. 

Your  idea  for  testing  possible  cooperative  effects  in  the  CO  simulation  by  using 
different  numbers  of  CO's  is  interesting.  It  certainly  would  be  one  way  of  approaching 
the  problem. 


DURUP  -  Back  to  the  calculation  with  the  60  carbon  monoxide  molecules  :  I  think  the 
model  does  not  take  into  account  the  possibility  that  the  CO  molecule  would  force  its 
way  through  the  protein  just  like  you  do  in  a  tight  crowd  with  your  shoulders.  Would  it 
not  be  possible  to  allow,  for  each  time  span  of  a  few  picoseconds,  the  protein  to  feel 
only  the  interaction  with  flnfi  CO  chosen  by  some  criterion  (largest  interaction  energy 
at  that  given  moment),  and  thereafter  to  give  its  chance  to  another  CO  molecule,  etc.? 

KARPUJS  -  It  is  indeed  true,  as  I  pointed  out  in  my  talk,  that  cooperative  effects  are 
neglected.  From  examining  the  simulations,  it  appears  that  the  CO  does  not  "force"  its 
way  through  the  protein,  but  that  most  commonly  the  protein  fluctuations  permit  the  CO 
to  get  through  a  region  which  normally  has  a  high  barrier.  There  are  certainly 
alternative  approaches  which  might  refine  the  methodology  we  employed  in  this 
pioneering  study.  Your  suggestion  describes  one  of  them  and  it  would  be  interesting  to 
implement. 


WIPFF  -  What  would  be  the  effect  of  the  surrounding  of  protein  on  theses  dynamics  ? 
How  for  instance  do  the  dynamics  of  globin  compare  in  vacuo,  in  the  crystalline  state, 
and  in  aqueous  solution  ? 

KARPLUS  -  The  question  of  the  effect  of  the  surroundings  of  a  protein  on  its  dynamics 
is  one  that  is  often  asked.  From  experiments  such  as  incoherent  neutron  scattering  and 
from  comparisons  of  simulations  in  different  environments,  it  appears  that  the  interior 
motions  of  a  protein  are  little  affected  by  the  environment.  The  surface  motions  are 
significantly  affected  by  the  environment.  The  surface  motions  are  significantly  affected 
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1 

i 

{  either  by  interactions  with  neighboring  molecules  in  the  crystal  or  with  the  solvent. 

\  From  simulations  at  varying  temperatures  it  appears  tha*  the  temperature  plays  an 

|  important  direct  role  in  alterating  the  dynamics,  though  the  environment  effects  of  the 

\  temperature  (e.g.,  freezing  of  the  solvent)  may  also  contribute.  We  have  not  made  a 

;  detailed  study  of  myoglobin  and  dynamics  in  different  environments. 
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Abstract 


Good  agreement  between  calculated  and  experimental  data  does  not 
necessarily  mean  that  the  underlying  theoretical  model  is  correct.  The  good 
agreement  may  be  due  to  compensation  of  errors.  A  number  of  examples  of  such  a 
situation  are  discussed. 

1.  Introduction 


In  the  field  of  molecular  modelling  and  computational  chemistry 
theoretical  mouels  of  molecular  systems  are  generally  based  on  the  basic 
physical  laws  and  interactions.  Molecular  models  that  are  used  to  describe 

complex  molecular  systems,  like  solvated  proteins,  often  contain  many 
parameters  and  a  large  number  of  degrees  of  freedom,  e.g.  the  positions  of  the 
atoms  in  the  system.  The  assumptions  implicit  in  a  particular  model  and  the 

parameter  values  that  arc  used,  are  generally  validated  by  a  comparison  of 
theoretically  predicted  with  experimentally  measured  properties  of  the  system 
under  consideration.  The  results  of  such  a  comparison  between  theory  and 
experiment  can  be  classified  as  follows. 

A.  Agreement  between  . heory  (model)  and  experiment 

The  good  agreement  may  be  due  to  one  or  more  of  the  following  reasons: 

1.  The  model  is  correct,  that  is,  any  other  assumption  used  to  derive 

the  model,  or  any  other  choice  of  parameter  values  would  give  bad 
agreement  with  experiment. 

2.  The  property  that  is  compared  is  insensitive  to  the  assumptions  or 

parameter  values  of  the  model,  that  is,  whatever  parameter  values 
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are  used  in  the  model  calculation  the  agreement  with  experiment  will 
be  good. 

3.  Compensation  of  errors  occurs,  either  by  chance  or  by  fitting  of  the 
model  parameters  to  the  desired  properties. 

B.  No  agreement  between  theory  (model)  and  experiment 
This  may  be  due  to  the  following  reasons: 

1.  The  model  is  not  correct ,  the  experiment  is. 

2.  The  model  is  correct,  the  experimental  data  are  not  correct. 

In  this  paper  we  will  give  a  number  of  examples  of  case  A3.  We  show  that 
when  only  a  few  numbers  or  properties  of  highly  dimensional  systems  or  models 
containing  many  parameters  are  compared  to  experimental  data,  it  may  easily 

occur  that  good  agreement  is  obtained  due  to  compensation  of  errors.  In  these 
cases  the  good  agreement  between  calculated  and  predicted  properties  does  not 
imply  that  the  theoretical  model  is  correct. 

2.  Calculation  of  the  crystallographic  R-factor 

The  crystallographic  reliability  factor  or  R-value  is  defined  as 

l  W(hkl)  ||Foi3(hki>|  ~kJC|FcaIc(hkl)|| 

R  =  liiLi -  *  100  %  (2.1) 

E  |F0(>j(hkl)| 
hkl 

where  the  vaRulated  (F or  observed  (FoilJ)  stucture  factor  is  defined  as 

the  Fourier  transform  over  the  unit  cell 

F(likl)  =  constant  *  fjf  p(xyz)  e27n*hx+ky+u*dxdydz  (2.2) 

of  the  electron  density  p(xyz).  In  an  X-ray  diffraction  experiment  reflection 
intensities 

I(hkl)  ~  |F(hkl)j2  (2.3) 

are  measured.  Weight  factors  in  reciprocal  space  are  denoted  by  W(hkl). 

From  definition  (2.1)  it  is  clear  that  the  better  the  agreement  between 
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IFcoicl  and  |Foj3|,  the  lower  the  value  of  R.  However,  the  R-value  also 
depends  critically  on  the  number  of  observations  and  parameters  in  (2.1),  that 
is,  the  R-value  depends  on 

the  resolution  (d)  range  of  the  diffraction  pattern 
the  number  Nrey,  of  observed  reflections 
the  number  Nsc  of  scaling  factors  kJC 

-  the  parameters  in  the  molecular  model  that  is  used  to  compute  F^,  etc. 

In  table  I  it  is  illustrated  how  the  R-value  may  be  reduced  by  reducing 
the  number  of  observations  (resolution  range  d,  number  of  reflections  Nr)  or 
by  increasing  the  number  of  adjustable  parameters  (number  of  scaling  factors 
Nac).  For  one  protein  structure  the  R-value  may  attain  values  between  35  %  and 
24  %  depending  on  the  choice  of  the  parameters  involved  in  (2.1).  This 
illustrates  the  -  of  course  known  -  fact  that  a  low  R-value,  that  is,  good 
agreement  with  experimental  data  according  to  (2.1),  does  not  necessarily 
imply  that  the  correct  molecular  structure  has  been  found. 

Table  I 

Dependence  of  R-value  upon  number  of  parameters 


resolution 
range  d  (A) 


number  of 
reflections  Nref, 


number  of 
scaling  factors  N,, 


R-value  (%)a 


1  Values  for  the  X-ray  structure  of  a-bungarotoxin  (Love  &  Stroud,  1986; 
Brookhaven  Protein  Data  Bank,  Bernstein  et  al.,  1977;  J.  Finer  private 
communication). 


3.  Density  of  a  cvtidine  derivative  crystal 


In  table  II  the  dimensions  are  shown  of  the  unit  cell  of  a  cytidine 
derivative  (Verdegaal  et  al.,  1981)  for  which  the  crystal  structure  has  been 
determined  at  1  atm  pressure  and  two  different  temperatures,  viz.  113  K 
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(Verdegaal  et  al.,  1981)  and  289  K  (James  &  Sielecki,  unpublished  work). 
Molecular  dynamics  (MD)  simulations  of  4  unit  cells  using  crystalline  periodic 
boundary  conditions  and  the  constant  pressure,  constant  temperature,  variable 
volume  algorithm  of  Berendsen  et  al.  (1984)  have  been  performed  at  the  two 
mentioned  temperatures  in  order  to  compare  model  and  experimental  properties 
(Van  Gunsteren  &  Berendsen,  1985). 


Table  II 

Density  of  cvtidine  derivative  crystals^ 


4  unit  cells 
dimensions  (nm) 
or  volume  (nm  ) 

temperature: 

experiment 

113K 

MD 

temperature: 

experiment 

289K 

MD 

2  a 

1.788 

1.792 

1.778 

1.790 

2b 

2.065 

2.050 

2.077 

2.064 

c 

2.758 

2.772 

2.864 

2.822 

volume  (4abc) 

10.18 

10.18 

10.56 

10.43 

a)(Van  Gunsteren  &  Berendsen,  1985;  Verdegaal  et  al.  1981) 

The  unit  cell  volume  is  very  well  reproduced;  a  deviation  of  0.0  % 

(113  K)  and  1  %  (289  K).  However,  this  nearly  perfect  agreement  is  due  to 

compensation  of  errors:  at  113  K  the  model  yields  too  small  a  value  for  b  and 

too  large  values  for  a  and  c;  at  289  K  a  is  too  large  and  b  and  c  are  too 

small.  By  comparing  three  quantities  in  stead  of  one  quantity,  the  agreement 
with  experiment  is  reduced. 

4.  Conservation  of  total  energy  when  integrating  Newton’s  equations  of  motion 

In  this  section  we  give  an  example  of  compensation  of  errors  leading  to 

incorrect  conclusions  with  respect  to  the  efficiency  of  two  different 

integration  algorithms  which  are  used  in  molecular  dynamics  (MD)  simulations. 

In  Newtonian  dynamics  the  total  energy  (Elol),  the  sum  of  the  kinetic 

(Efcin)  and  the  potential  (Epo( )  energy,  is  conserved: 

N 

E(oi  =  '/>  E  m,v,  +  V(?„  N)  =  constant 

i-i 


(4.1) 
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V  This  means  that  the  better  the  integration  of  Newton’s  equations  is  performed, 
the  smaller  the  fluctuation  in  the  total  energy, 

AE(0(  =  <(E,0l  -  <E1oi>]2>'/2  (4.2) 

will  be.  Here  averaging  over  the  trajectory  is  denoted  by  <...>.  Therefore, 
one  may  think  that  the  size  of  AE.o!  may  be  used  to  judge  the  performance  of 
*  an  integration  algorithm. 


Table  III 

a) 

Conservation  of  total  energy  in  MD  simulations 

Fluctuation  AE,ot  of  the  total  energy  (kcal/mol ) 


bond 


Algorithm 

vibrations 

At  =  0.5  fs 

At  =  1  fs 

At  =  2  fs 

Beeman 

yes 

0.004 

0.30 

1.1 

Verlet 

yes 

0.3 

1.2 

4.3 

Verlet /SHAKE 

no 

0.09 

0.2 

1.0 

“Data  for  MD  simulations  of  the  protein  BPTI,  taken  from  Levitt  (1983). 

In  table  III  the  fluctuation  of  the  total  energy  is  given  for  two 
different  algorithms,  the  Beeman  and  the  Verlet  algorithm.  The  data  are  from 
Levitt  (1983).  Since  the  Beeman  algorithm  produces  the  smallest  AEl0(,  Levitt 
concludes  that  it  is  better  than  the  Verlet  algorithm.  However,  it  can  be 
shown  (Aqvist  et  al.,  1985);  Berendsen  &  Van  Gunsleren,  1986)  that  the  Beeman 
algorithm  (Beeman,  1976) 

x(t+At)  =  x(t)  +  v(t)At  +  [4a(t)  -  a(t-At))(At)2/6  (4.3.a) 

v(t+At)  =  v(t)  +  (2a(t+At)  +  5a(l)  -  a(l-At)]At/6  (4.3. b) 

produces  exactly  the  same  trajectory  as  the  Verlet  algorithm  (Verlet,  1967) 

x(t+At)  =  2x(t)  -  x(t-At)  +  a(t)(At)S  (4.4.a) 


t 
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v(t)  =  (x(t+At)  -  x(t-At)]/(2At).  (4.4. b) 

Positions  x(t),  velocities  v(t)  and  acceleration  a(t)  are  functions  of  time  t 
over  which  is  integrated  in  steps  of  size  At.  If  the  Beeman  and  Verlet 
trajectories  are  identical,  how  can  the  different  AE(0(  values  in  the  columns 
of  Table  III  be  explained? 

Tile  identical  trajectories  yield  identical  AEpo(  values,  but  the  Verlet 
velocity  formula  (4.4.bj  is  less  accurate  than  the  Beeman  formula  (4.3. b). 
This  leads  to 

AEj.m( Verlet)  >  AEfc;n(Beeman)  (4.5) 

and  because  of 

AE)J0,(Verlet)  =  AEp^Beeman)  (4.6) 


to 


AE,0(( Verlet)  >  AE,0<(Beeman).  (4.7) 

We  note  that  the  Verlet  trajectory  is  independent  of  formula  (4.4. b),  since 

x(t+At)  only  depends  on  previous  positions  and  acceleration,  not  on  the 
velocity.  Relation  (4.7)  is  reflected  in  the  first  two  lines  of  table  111. 

The  third  line  shows  the  effect  on  AE(0,  of  removing  the 

highest-frequency  (bond-length)  vibrations  from  the  molecule,  which  can  be 
done  by  applying  the  procedure  SHAKE  (Ryckaert  et  al.,  1977).  Removal  of 
high-frequency  motions  yield  better  energy  conservation,  as  expected. 

So,  the  last  column  At  =  2  fs  of  table  III  contains  an  example  of 

obtaining  almost  identical  numbers  for  different  reasons.  Beeman  with  bond 

vibrations  inaccurately  integrated  (At  =  2  fs  is  too  large)  yields  AE,0,  =  1.1 
kcal/mol.  When  using  Verlet,  the  poor  velocity  formula  (4.4. b)  raises  AE(0,  to 

4.3  kcal/mol,  but  the  trajectory  is  not  changed  due  to  the  equivalence  of 
(4.3)  and  (4. 4. a).  Upon  removal  of  the  bond  vibrations  the  equations  of  motion 
are  more  accurately  integrated,  which  leads  to  a  reduction  of  AE(0(  to  1.0 
kcal/mol.  From  the  first  two  columns  of  table  111  it  is  observed  that  the 

removal  of  bond  vibrations  cannot  completely  cancel  the  large  kinetic  energy 


fluctuations  due  to  (4.4.b)  when  At  is  taken  smaller.  A  proper  analysis  of 
Levitt’s  data  (1983)  leads  to  the  conclusion  that  the  Verlet/SHAKE  algorithm 
yields  more  accurately  integrated  trajectories  than  the  Beeman  algorithm, 
whereas  a  loose  comparison  of  AE,0(  values  leads  to  the  opposite  (not  correct) 
conclusion. 

5.  The  computation  of  free  enthalpy:  adequate  sampling 

The  relative  free  energy  or  enthalpy  difference  AG(A-B)  =  AGB/1  =  GB-G/1 
between  two  states  A  and  B  of  a  molecular  system  can  be  obtained  by  the 

technique  of  thermodynamic  integration  (see  e.g.  Van  Gunsteren  &  Weiner, 

1989).  The  potential  energy  V(  r )  is  made  a  function  of  a  coupling  parameter  A 
in  such  a  manner  that  V(  r  ,Aa)  corresponds  to  the  system  in  state  A,  and 

V(  r  ,Ab)  to  state  B.  During  a  simulation  the  coupling  parameter  is  slowly 
changed  from  XA  to  AB.  If  the  change  is  reversibly  performed  AG  is  obtained  as 
an  integral  over  the  configuration  space  accessible  to  the  system.  The 
reliability  of  the  obtained  AG  value  strongly  depends  on  the  extent  of  the 
sampling  of  configuration  space.  Partial  information  on  the  adequacy  of  the 

sampling  can  be  obtained  by  changing  A  forward  from  A/t  to  XB  and  backward  to 
XA  and  computing  the  hysteresis  along  the  closed  loop 

AGhya  =  AG(A-B)  +  AG(B-A).  (5.1) 


Table  IV 


Hysteresis  in  free  entiialnv 

calculation 

a) 

Condition 

>s 

■C 

<3 

Obtained  AG  value 

tMD  «  T system 

0 

incorrect 

tMD  53  ^system 

^  0 

incorrect 

tMD  »  T system 

0 

correct 

‘  The  relaxation  time  of  the  system  (environment)  is  denoted  by  T3y3tcm.  The 
time  period  of  the  MD  simulation  over  which  the  change  from  state  A  to  state  B 
is  performed  is  denoted  by  tmd. 
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Three  cases  can  be  distinguished  (table  IV).  If  the  length  tmd  of  the  MD 
simulation  is  much  longer  than  the  slowest  relaxation  time  rsys(em  of  the 
system,  sampling  is  likely  to  be  sufficient,  &Ghy3  «  0  and  the  obtained  AG 
value  is  reliable.  When  tmd  «  Tsystem,  sampling  is  insufficient,  the  change 
A-B’A  is  irreversible,  leading  to  AGhyj  ^  0  indicating  an  incorrect  AG  value. 
If  tmd  «  T system  the  hysteresis  will  be  nearly  zero,  since  the  system  cannot 
adapt  itself  at  all  to  the  change  A-B-A.  An  incorrect  AG  value  is  obtained 
despite  the  observation  AGhys  ~  0. 

The  last  two  cases  are  illustrated  in  table  V  for  the  process  of  changing 
a  Ne  atom  into  a  Na+  ion  in  aqueous  solution.  For  the  shorter  tmd  the 
hysteresis  is  considerable,  it  is  only  reduced  when  tmd  >  T^ter  (dielectric 
relaxation  time  »  8  ps;  rotational  correlation  time  «  2  ps).  Nevertheless  the 
averaged  free  enthalpy  estimate  AGa„  is  remarkably  independent  of  r,/D,  due  to 
compensation  of  errors  in  the  forward  and  backward  integration. 

Table  V 

Hysteresis  as  a  function  of  integration  perioda) 


length  of 
MD  simulation 


AGhy,= AG(  A-B )+ AG(B-A )  ACal;=^G(A~B)~AG(B<A)- 


t.ud  (ps) 

5 

10 

20 

40 

80 


(kJ/mol) 

39 

29 

18 

1 

1 


(kJ/mol) 

-422 

-419 

-420 

-421 

-421 


^  \ 

'Data  for  the  process  of  changing  neon  (state  A)  in  aqueous  solution  (216 
H20,  At=lfs,  Rc=0.9nm)  to  sodium  (state  B),  taken  from  Straatsma  (1987). 

6.  The  computation  of  free  enthalpy:  long-range  contributions 

In  the  previous  example  of  the  change  Ne  -  Na+  a  full  charge  was  created. 
Since  the  Coulomb  interaction  is  inversely  proportional  to  the  first  power  of 
the  charge-charge  distance  (~  r"1),  the  free  enthalpy  AG  of  creating  a  charge 
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will  be  dependent  on  the  range  (or  cut-off  radius  Rc)  of  the  interaction  that 
is  taken  into  account.  This  is  illustrated  in  table  VI,  the  larger  Rc  the 
lower  AG  will  be.  Even  when  the  Born  formula  (Born,  1920;  Straatsma  1987) 


8ns  „  R 


4-  (i  -  lu) 


(6.1) 


is  used  to  calculate  the  contribution  from  the  continuum  dielectric  beyond  Rc 
(with  e=80  for  water),  the  resulting  AG  value  is  sensitive  to  the  Rc-value. 
Straatsma  (1987)  also  showed  that  a  variety  of  free  enthalpy  estimates  can  be 
obtained  by  using  different  values  for  the  cut-off  radii  of  solute-water  and 
water-water  interactions.  When  creating  or  annihilating  a  charge,  the  free 
energy  change  is  very  sensitive  for  the  cut-off  radius  Rc  that  is  used  in  the 
calculation. 


Table  VI 

Free  enthalpy  as  a  function  of  cut-off  radius'^ 


Range  of  interaction 

Rc  (nm) 

solute-water  water-wate  r 

MD 

AG  (kJ/mol) 

Born  correction 
interval  (Rc,<x>) 

Sum 

0.9 

0.9 

-424 

-76 

-500 

1.2 

0.9 

-461 

-57 

-518 

0.9 

1.2 

-404 

-76 

-480 

1.2 

1.2 

-429 

-57 

-486 

r.  V  ^ 

'  Data  for  the  process  of  changing  Ne  (state  A)  into  Na  (state  B)  in  aqueous 
solution  (512  II20,  rwo=40ps,  At=lfs),  taken  from  Straatsma  (1987). 


7.  The  computation  of  free  enthalpy:  dependence  on  model  parameters 


Even  when  no  full  charges  are  created  the  free  energy  of  hydration  can  be 
very  sensitive  to  model  parameters  that  are  used  in  the  calculation.  This  is 
illustrated  in  tables  VII  and  VIII  for  the  simple  process  of  changing  a  II20 
molecule  into  a  CH3OH  molecule  in  aqueous  solution.  Model  I  and  II  have  been 
taken  from  the  litterature  (Hermans  et  a!.,  1984;  Van  Gunsteren  &  Berendsen, 
1987;  Jorgensen,  1981)  and  are  both  reasonable  CI13011  models,  which  however 
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differ  by  7  kJ  mol’1  in  free  enthalpy  of  hydration  with  respct  to  the  SPC 
water  model  (Berendsen  et  al.,  1981).  When  combining  the  geometry  and  Van  der 
Waals  parameters  of  model  I  with  the  charges  of  model  II,  a  completely 
different  AG  value  is  found.  This  shows  that  the  free  enthalpy  of  hydration  is 
very  sensitive  to  the  parameters  used  in  the  molecular  model. 

Since  a  free  enthalpy  estimate  is  critically  dependent  on  sampling, 
cut-off  radius  and  model  parameters,  it  is  easily  possible  that  good  agreement 
of  simulated  with  experimental  free  entnalpy  is  due  to  compensation  of  errors. 

Table  VII 

<1 ) 

Models  for  methanol-water  interaction 


Geometries  and  charges 


fl2Ob) 

ai3OH(i)c> 

CH3OH(Ilf) 

~.82e 

-.548e 

~.685e 

0\V 

OA 

OA 

0.1  nm \  /  \< 0  1  nm 

M  09 . 5°  ' 

0.143  nm  /  \  0.1  nm  0.143  nm  /  \  0.0945  nm 

'  109. 5°\  / 108.5°  \ 

HW  HW 

cn3  ha 

CII3  HA 

+  .  41c  +  .  41e 

+  .15e  + .  398e 

.  285c  +  •  46 

c) 

Van  der  Ikaals  ■parameters  ' 

Atom  pair 

C12  (kcal  mor'A12) 

C6  (kcal  mol’1  A6) 

o 

s5 

1 

o 

793.3  *  793.3 

25.01  *  25.01 

OW  -  Cll3  (I) 

421.0  *  2500 

25.01  *  46.06 

OW  -  OA  (I) 

793.3  *  600.0 

25.01  *  23.25 

O 

1 

Q 

421.0  *  2820 

25.01  *  48.99 

OW  -  OA  (II) 

793.3  *  717.6 

25.01  *  24.49 

aiData  from  (Berendsen  et  al.,  1981b);  Hermans  et  al.,  1984;  Van  Gunsteren  & 
Berendsen,  1987c);  Jorgensen,  1981d)) 

c)The  hydrogen  atoms  do  not  have  Van  der  Waals  interactions. 


Table  VIII 


:';^:SS’X 


Free  enthalpy  of  methanol  hydration 


Process  ' 

H20  -  CH3OH  (I) 
H20  -  CH3OH  (II) 
H20  -  ai3OH  (I,II)b 


AG  (kJ/mol) 


^Change  carried  out  over  20  ps.  Rc--0.8nm,  At=2fs,  for  a  periodic  box 
containing  216  H20  molecules.  The  experimental  value  is  5  kJ  mol"1  (Ben-Naim  & 
Marcus,  1984) 

b)Geometry  and  Van  der  Waals  parameters  from  model  I,  charges  from  model  II 
(P.A.  Kollman,  private  communication). 

8.  Spatial  molecular  structure  determination  bv  2D-NMR 

The  last  example  of  obtaining  good  agreement  between  theoretical  and 
experimental  data  by  compensation  of  errors  lies  in  the  area  of  structure 
determination  by  nuclear  magnetic  resonance  (NMR)  measurements  (Wuthrich, 
1986;  Kaptein  et  al.,  1988).  High-resolution  NMR  is  able  to  resolve  individual 
proton  resonances  of  proteins  in  solution.  Once  the  observed  resonances  have 
been  assigned  to  individual  protons,  two-dimensional  (2D)  nuclear  Overhauser 
enhancement  (NOE)  spectra  can  be  used  to  obtain  upper  (and  lower)  limits  or 
distance  constraints  to  the  distances  between  pairs  of  protons.  The  next 
problem  is  the  derivation  of  a  3D  structure  satisfying  these  distance 
constraints.  Crude  molecular  structures  that  approximately  satisfy  the 
constraints  can  be  refined  by  MD  simulation  using  a  simple  energy  term 
(Kaptein  et  al.,  1985) 

Ndc 

V(ic  =  S  *  Kdc  (r„  -  r";  )2  (8.1) 

n=l 

which  represents  the  set  of  Ndc  distance  constraints,  where  tiie  distance 
constraint  between  atoms  i  and  j  is  denoted  by  r‘‘r  The  term  Vdc  is  added  to 
the  normal  interaction  function  V.  By  performing  MD  simulation  and  subsequent 
energy  minimization  (EM)  one  searches  for  a  conformation  whicli  has  a  low  total 
energy  E  4  Edc. 


^js*wOT»ai«>»=aaK!<«^^ 
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The  term  \’dc  forces  the  molecule  to  satisfy  the  experimental  data 
(constraints).  Therefore,  the  larger  its  weight  Kdc  is,  the  smaller  the 
distance  constraint  violations  will  be.  But,  the  agreement  with  the 
experimental  data  is  generally  paid  for  by  a  higher  energy  E  of  the  molecule, 
that  is,  the  molecule  adopts  a  more  strained  conformation  in  order  to  satisfy 
the  distance  constraints. 

An  example  is  found  in,  table  IX,  where  a  short  MD  refinement  of  two  lac 
repressor  structures  is  shown.  The  starting  structures  were  taken  from  (De 
Vlieg  et  al.,  1986):  the  final  (best)  structure  and  a  structure  which  still 
had  its  loop  wrongly  folded  (after  20  ps  of  refinement).  MD  refinement  with  a 
large  weight  Kic  =  17000  kJ  mof'nm’2  makes  the  wrongly  folded  structure  have 
lower  constraint  violations  (0.180  nm)  than  the  correctly  folded  structure 
(0.258  nm),  but  this  occurs  at  the  expense  of  a  higher  energy  of  -2996  kJ 
mol”1  compared  to  -3053  kJ  mol’1.  When  the  constraint  energy  term  is  switched 
off  (Kdc=0),  EM  and  MD  do  relax  the  molecule  at  the  expense  of  larger 
constraint  violations.  We  conclude  that  3D  structure  obtained  from  NMR  should 
not  only  show  small  violations  of  distance  constraints,  but  must  also  have  a 
low  energy.  Otherwise  one  may  have  obtained  a  wrongly  folded  structure  by 
which  the  experimental  data  are  satisfied  at  the  expense  of  a  strained 
non-stable  conformation. 


Table  IX 

MD  refinement  of  lac  repressor  structures 


^  j 

1.  initial  structure 

2.  after  5  ps  MD  +EM 

Kdc=17000  kJmor'nm’2 

3.  after  EM 
Kdc=0 

4.  after  5  ps  MD  +  EM 
Kdc=0 


structure  I 
loop  wrongly  folded 
constraint 

violation  (nm )  energy 


sum 

average 

(kJ/mol) 

3.242 

0.015 

-2115 

0.180 

0.001 

-2996 

0.523 

0.002 

-3083 

3.297 

0.015 

-3032 

structure  II 


loop 

correctly 

folded 

constraint 
violation  (nm  ) 

energy 

sum 

average 

(kJ/mol) 

0.405 

0.002 

-3092 

0.258 

0.001 

-3053 

0.461 

0.002 

-3100 

1.823 

0.008 

-3102 

a,Initial  structures  were  taken  from  De  Vlieg  et  al.  (1986). 


aeex&QaZ* 


Table  X 

Energies  of  X-rav  and  NMR  protein  structures 


source  of 

number  of  data  or 
residues  structure 

Ndc 

Kdc 

average 

violation 

energy 

force  field 

(kJmof'nm" 

'2)  (nm) 

(kJmof1 

) 

aPP 

36 

X-raya) 

-2180 

GROMOSJ 

crambin 

46 

X-raya) 

-2161 

GROMOS 

BPTI 

58 

X-raya) 

-3529 

GROMOS 

phospholipase  A2  123 

X-raya> 

-7848 

GROMOS 

lac  repressor 

51 

NMRb) c) 

215 

4000 

0.003 

-3091 

GROMOS 

tendamistat 

74 

NMRC) 

842 

-3140,- 

-2834  GROMOS 

crambin 

46 

X-ray /NMRd)240 

5000 

0.033 

-2247 

CHARMM1 

hirudin 

56 

NMRe) * 

359 

17000 

0.016 

-1138 

CHARMM 

histone  H5 

79 

NMR0 

307 

17000 

0.015 

-1527 

CHARM  M 

CPI 

39 

NMRg) 

309 

33000 

0.007 

-724 

CHARMM 

phoratoxin 

46 

NMRh) * * k) 

331 

33000 

0.010 

-1029 

CHARMM 

od-purothionin 

46 

NMR0 

310 

33000 

0.023 

-498 

CHARMM 

a)  X-ray  structures  from  Brookhaven  Protein  Data  Bank  (Bernstein  et  al.,  1977) 

b>  De  Vlieg  et  ai.  (1986) 

c)  Kline  et  al.  (1988),  lowest  and  highest  energy  of  a  set  of  structures  is 
given 

d'  Clore  et  al.  (1986a) 

e)  Clore  et  al.  (1987a) 

0  Clore  et  al.  (1987b) 

s)  Clore  et  al.  (1987d) 

10  Clore  et  al.  (1987c) 

0  Clore  et  al.  (1986b) 

J>  Van  Gunsteren  &  Berendsen  (1987) 

k)  Brooks  et  al.  (1983) 
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How  low  should  the  molecular  energy  be?  In  table  X  we  list  the  energies 
of  a  number  of  proteins  for  which  a  high-resolution  X-ray  structure  is  known. 
As  expected,  the  energy  depends  on  the  size  (number  of  residues)  of  the 

protein.  The  NMR  based  structures  of  the  lac  repressor  and  tendamistat  both 

have  energies  comparable  to  those  of  X-iay  determined  structures.  In  contrast, 

the  5  last  structures  in  table  X  display  a  relatively  high  energy,  at  least 

about  1000  kJ  mol'1  higher  than  an  X-ray  structure  of  comparable  size.  This 
may  be  due  to  the  fact  that  the  use  of  a  large  Kdc  will  lower  the  constraint 
violations  at  the  expense  of  a  large  internal  energy.  The  discrepancy  in 
energy  cannot  be  due  to  the  application  of  different  force  fields.  CHARM M 
(Brooks  et  al.,  1983)  and  GROMOS  (Van  Gunsteren  &  Berendsen,  1987)  consist  of 
comparable  atom-atom  interaction  terms:  MD  refinement  of  crambin  using 
theoretical  distance  constraints  derived  from  the  X-ray  structure  leads  to  an 
energy  of  -2247  kJ  mol'1  in  the  CHARMM  force  field  (Clore  et  al.,  1986a).  The 
GROMOS  force  field  yields  a  comparable  value  of  -2161  kJ  mol"1  for  the  crambin 
X-ray  structure. 

We  conclude  that  a  wrongly  folded  structure  may  be  forced  to  agree  with  a 
set  of  experimental  distance  constraints  by  using  large  weights  in  the 
distance  constraint  energy  term.  When  a  protein  structure  that  is  refined 
displays  a  relatively  large  energy  -  compared  to  the  energies  of  X-ray 
structures  of  proteins  of  comparable  size  -  this  must  be  taken  as  a  warning 
that  a  (partially)  wrongly  folded  structure  is  obtained. 

9.  Conclusions 


In  this  paper  we  have  shown  that  good  agreement  between  theoretically 
calculated  and  experimentally  measured  data  does  not  necessarily  imply  that 
the  theoretical  model  is  correct.  Compensation  of  errors  may  be  at  the  basis 
of  the  good  agreement.  Especially  when  high-dimensional  systems  modelled  with 
many  parameters  are  concerned,  it  is  relatively  easy  to  choose  or  fit 
parameters  such  that  good  agreement  is  obtained  for  a  limited  number  of 
observable  quantities. 
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THEORETICAL  STUDY  OF  A  CONFORMATIONAL  CHANGE  IN  AN  ENZYME  :  METHODOLOGY  AND 
FIRST  RESULTS  ON  CITRATE  SYNTHASE 

J.  DURUP 

Laboratoire  de  Physique  Quantique  (UA  505  du  C.N.R.S.)  118,  route  de  Narbonne 
31062  Toulouse  (France) 

Following  the  lines  and  using  the  codes  developped  in  Harvard  by  M.  Karplus 
and  his  colleagues,  we  introduced  an  alternative  strategy  for  dynamical  calcu¬ 
lations  on  proteins  at  various  time  scales.  It  essentially  consists  of  the 
following  steps. 

(i)  We  constructed  a  set  of  internal  vectors  by  the  principle  of  the  binary 
tree,  adapted  to  the  peculiar  structure  of  each  residue  and  to  the  secondary 
and  tertiary  structures  of  the  protein.  The  transformation  from  Cartesian  coor¬ 
dinates,  velocities  and  energy  gradients  to  the  new  set  and  vice  versa  needs 
purely  topological,  constant  matrices,  the  operation  of  which  requires  very 
little  computer  time  ;  the  kinetic  energy  operator  remains  diagonal  and  thus 
the  hamiltonian  dynamics  reduces  to  simple  Newtonian  dynamics. 

(ii)  From  these  internal  vectors  we  generated  a  set  of  relative  polar  coor¬ 
dinates,  where  each  vector  is  referred  to  the  immediately  lower  ones  in  the 
tree  ;  then  we  determined  through  a  dynamical  test  the  frequencies  associated 
with  each  of  these  coordinate,  which  permits  a  classification  of  slow  and  fast 
modes. 

(iii)  We  compared  a  regular  dynamics  by  the  CHARMM  programs,  using  a  1  fs 
integration  step,  with  a  constrained  dynamics  using  8  fs  steps  after  freezing 
most  of  the  modes  of  frequencies  larger  than  ca  1.4  x  10  s"  ,  and  affecting 
the  other  fast  modes  with  a  damping  coefficient.  It  appears  that  the  main 
features  of  the  dynamics  of  the  slower  modes  are  preserved,  apart  from  phase 
shifts  occuring  during  the  freezing  process.  These  tests  were  performed  on  the 
dimeric  enzyme  citrate  synthase  (437  residues  per  monomeric  unit) . 

(iv)  Iteration  of  this  procedure  to  larger  time  scales  is  possible  and 
will  soon  be  performed.  One  may  then  hope  to  obtain  significant  collective 
modes  by  diagonalization  of  the  correlation  matrix  of  slow  mode  amplitudes. 

Details  of  this  work  will  be  submitted  for  publication  in  the  Journal  of 
Physical  Chemistry. 
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DISCUSSION 


PERAHIA  -  How  do  you  explain  that  you  obtain  larger  rms)  deviations  than 
conventional  dynamics  for  some  groups  of  the  protein  ?  1  ask  this  question  because 
the  dynamics  using  holonomic  constraints  give  smaller  rms  values. 

DURUP  -  For  money-saving  reasons  we  did  not  perform  the  necessary  30-picosecond 
(at  least)  pre-thermalization,  but  only  2  picoseconds,  with  the  exact  dynamics.  This 
results  in  spurious  drifts  of  some  a-carbon  atoms,  as  well  in  the  exact  as  in  the 
constrained  dynamics,  which  exceed  the  actual  fluctuation  rms’s. 


ANGYAN  -  Do  you  see  any  special  problems  to  include  solvent  modes  in  your 
procedure  ? 

DURUP  - 1  expect  to  include  in  my  programs  solvent  effect  calculations,  either  through 
a  collaboration  with  eastern-Europe  colleagues  (R.  Zahradnik,  F.  Aschenbach)  or 
using  other  authors'  codes.  This  will  require,  as  stated  in  my  introduction,  entropy  as 
well  as  energy  terms. 


VERGOTEN  -  You  are  freezing  modes  with  frequencies  higher  than  1013s_1  i.e.  for 
spectroscopist  330  cm'1.  Experimentally  the  -NH3+  or  -CH3  torsional  modes  appear  in 
the  range  200-250  cm'1  that  means  they  are  not  frozen  in  your  procedure  ? 

DURUP  -  The  lO^s'1  figure  I  gave  is  an  order  of  magnitude.  The  actual  separation 
limit  between  faster  and  slower  modes  was  rather  1.4  x  1013s-1.  Anyway  it  is  not  a 
strict  separation  because  of  the  requirement  that  all  identical  modes  of  residues  of  the 
same  kind  have  to  be  treated  on  an  equal  footing  whatever  their  individual  frequencies 
in  their  specific  environments,  for  the  sake  of  transferability  of  the  code  to  any  protein. 
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SQUMPASIS  -  Did  somebody  try  to  eliminate  the  fast  motions  analytically  i.e.  use  the 
normal  mode  collective  excitations  instead  of  the  cartesian  coordinates  and  work  with 
mode-mode  coupling  theories  ? 

DURUP  -  The  answer  is  "yes"  and  references  were  given  in  my  fourth  viewgraph 
(B.  Brooks  and  Karplus  1983,  Levitt  et  al  1985,  Noguti  and  Go  1985,  Tobias  and 
C.  Brooks  1988,  Sanejouand  and  Perahia  to  be  published). 
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SUMMARY 

The  nonarchimedean  (NA)  analysis  was  used  to  describe  relaxation  processes 
in  chemical  compounds -membrane  interactions  and  the  selfexcitation  of  membranes. 
The  method  relies  upon  the  replacement  of  the  real  time  by  elements  of  Na  fields. 
Thus  an  enriched  mathematical  and  physical  structure  results. 

Particularly,  it  is  shown  that  an  asymptotic  expansion  formalism  can  easily 
accommodate  nearly-exponential  relaxations  in  multi-compartimental  systems  (e.g. 
the  action  potential  of  the  frog  sciatic  nerve  in  the  presence  of  cross-linking 
aldehydes) . 

Intermittent  regimes  in  excitable  membranes  are  studied  using  dyadic  expan¬ 
sions  and  Walsh-Fourier  analysis. 


INTRODUCTION 

The  existence  of  more  than  one  characteristic  time  scale  is  a  veil  known 
feature  of  far-from  equilibrium  biosystems.  For  instance,  a  variety  of  distinct 
time  scales  is  displayed  by  the  ionic  pumps  considered  now  as  channels  whose 
energy  barrier  profile  is  transiently  modified  by  the  phosphorylation  ,-dephospho- 
rilation  cycle  (refs.  1,  2). 

A  lot  of  time  scales  characterize  the  relaxation  of  disordered  media,  similar 
to  membranes  (ref.  3).  Also,  in  complex  biochemical  reactions  one  or  several 
steps  of  the  reaction  sequence  become  very  rapid  as  compared  to  others  and  a 
chemical  instability  corresponding  to  a  new  time  scale  usually  appears. 

All  such  phenomena  have  been  studied  using  the  concept  of  Na  time.  The 
proposed  NA  formalism  proves  to  be  an  appropriate  modelling  tool  if  the  studied 
system  can  be  divided  into  a  number  of  hierarchised  interacting  subsystems. 

Such  formalisms  have  been  used  to  describe  either  nearly-linear  phenomena  (a 
perturbation  method)  or  extremely  nonlinear  interactions  (a  boolean  method). 

This  work  is  part  of  a  recent  attempt  to  apply  NA  analysis  (see  ref.  4) 
in  chemistry  and  biophysics  (refs.  5-8). 
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EFFECT  OF  PROTEIN  CROSS-LINKING  ALDEHYDES  ON  NERVE  ACTIVITY 
A  multi-ccmpartmental  model 

The  necessity  of  multi  -compartments  1  models  of  relaxation  in  the  study  of 
chemical  compounds  -  membrane  interactions  is  imposed  by  the  fact  that  mem¬ 
brane  excitability  could  depend  on  the  collective  behaviour  of  a  number  of 
molecular  units  (compartments  )  which  all  have  to  remain  intact  for  the 
survival  of  the  excitability. 

Consider  a  system  of  n+1  identical  compartments  connected  in  series. 
Denote  by  y(t)  the  time  evolving  physical  variable  of  the  relaxation  process 
(in  this  case,  the  amplitude  of  the  compound  action  potential  of  frog 
sciatic  nerve  (refs.  9-10).  The  variable  corresponding  to  the  i-th  compart¬ 
ment  is  y^  (t). 

The  relaxation  process  of  such  system  is  described  by: 
dy^(t)  (i-l) 

- =  -yn;(t)  +  yn  ^(t),  l<i<n  '  (1) 


where  t  =  ax  is  the  dimensionless  time,  t  is  the  time,  and  a  is  the  decay  or 
relaxation  rate  assuned  to  be  the  same  for  any  compartment.  The  index  i  =  0 
refers  to  the  single  compartment  model  (ref.  6).  The  solution  of  (1)  with 
initial  conditions: 


y(%  1,  y(1,(0)  = 


=  y(n,(0)  =  0 


y^(t)  =  t1  exp(-t)/i ! 


To  take  into  account  the  existence  of  different  time  scales  we  replace  the 
time  t  by  the  expansion  T: 

M 

T  =  t  +  e  Wjt  +  ...  +  E  W^t  (4) 

Here  M  is  the  nunber  of  time  scales,  e  is  an  expansion  parameter  and 

w  ,  m  =  1,  ...,  M  are  constants.  Denote  also  T  =  [t,  w,t . .  wMtj.  The  use 

m  fil  1  M 

of  (4)  induces  a  similar  structure  of  y'  '(t),  that  is: 

Y(i)(T)  ,  +  eyp(t)  +  ...  +  (5) 

where  y^(t)  are  real  functions. 

Both  (4)  and  (5)  can  be  viewed  as  elements  of  an  NA  field  (ref.  4).  This  NA 
field  is  endowed  with  addition  and  multiplication  rules  defined  as  follows: 
if  A  =  [aQ,  aj]  and  B  =  (b0»  b^j  then  A  +  B  =  [a0+b0,  +  b^] 


n 
;,] 
-  i 

.  '5 


vi 


i 


AB  =  [a  b  ,  a  b.  +  a,b  ]. 
oo  o 1  lo 

In  the  new  frame  the  relaxation  will  be  described  by  the  system  of  NA 
equations: 

'dY(1)  =  _y(i)  +  y(i-l) 


(6) 


i 


I! 


The  solution  of  (6)  is: 
Y(l)(T)  =  CT1  exp(-T)/i  I 


(7) 


where  C  is  an  NA  constant.  We  take  wffl  =  1  and  consequently  C  =[1,  1,  . ..,  1] 
(a  normalization  condition).  Expanding  the  exponential  fran  (7)  in  power 
series  of  T  and  using  the  operations  in  the  NA  field,  one  obtains: 


ft 

Y(i)(T)  =  t1  exp(-t)/i !  I  eV^t) 


-  m=0 


(8) 


(i)i 


where  '(t)  are  the  Laguerre  polynomials: 

M)  B  m+i 

-  (t)  ■  jo  (»-k’  ('t)  A!  (9) 


A  general,  calculated  solution  of  (6)  valid  for  different  and  initial 
condition  is: 


M 


=  t1  exp(-t)/i!  I  q^L^t) 


m=0 


(10) 


,(i) 


The  coefficients  1  are  related  to  the  contribution  of  the  m-th  scale 


of  the  process. 


In  the  NA  formalism  we  obtain  the  minimum  (null)  distance  between  y^(t) 

('<)  I 


given  by  (10)  and  experimental  values  yw(t)  if: 

r1)  q<()  -  /"  y<i)(t>u<i>(t)  it 


(11) 


Note  that  in  the  NA  frame  we  cannot  discriminate  between  the  solution  cor¬ 
responding  to  different  M  (ref.  5).  A  truncation  number  M  in  (10)  that  is  the 
best  model  is  selected  using  the  sum  of  squares  of  deviations: 


S(M)  =  /  (yW(t)  -  y(i)(t))2dt 


(12) 


afessssaaaatt 


r 
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Analysis  of  experimental  data 

The  predictions  of  the  NA  model  are  used  to  describe  the  effect  of  cross- 
linking  aldehydes  on  the  amplitude  of  the  compound  action  potential 
of  frog  sciatic  nerve  (formaldehyde  f.a.,  0.10%,  crotonic  aldehyde  c.a.,  0.10% 
and  gTutaraldehyde  g.a.  0.25%)  (refs. 9,  10). 

The  relaxation  rate  a  is  obtained  taking  a  =  (n+l)/x.  Here  f  is  the  mean 
relaxation  time  of  the  studied  system  (45.85  min  for  f.a.,  6.65  min  for  c.a. 
and' 76.80  min  for  g.a.-). 

Using  (11)  with  y^(t)  obtained  from  experiments  one  gets  q^  and  the 
objective  function  S(M)  for  different  number's  of  compartments,  n+1,  and  of 
scales,  M+l. 

Table  1  contains  the  values  of  S(M),  at  n  =  1,  for  different  M.  When  M 
increases  from  0  to  3  the  sum  of  squares  of  deviations  decreases  sharply, 
while  for  values  of  M  above  3  it  remains  practically  constant.  It  has  been 
concluded  that  two  compartments  and  four  scales  of  time  provide  an  optimal 
fitting  of  the  relaxation  process  for  all  cross-linking  aldehydes. 


TABLE  I 

Objective  functions  $(M) 


M 

0 

1 

2 

3 

4 

5 

f.a. 

4.33 

.37 

.22 

.11 

.10 

.09 

c.a. 

.164 

.044 

.023 

.013 

.013 

.012 

g.a. 

9.82 

.21 

.07 

.026 

.028 

.025 

The  number  n+1  of  compartments  appearing  in  the  model  could  be  viewed  as 
the  number  of  targets  which  must  be  "hit"  by  aldehydes  in  order  to  block  im¬ 
pulse  propagation  in  each  nerve  fibre.  In  this  model  the  time  scales  corres¬ 
pond  to  parallel  pathways  with  different  permeabilities.  This  interpretation 
agrees  with  the  existence  of  four  repeated  regions  or  organizational  levels 
in  the  sodium  channels  (refs.  1,2, 6, 7). 

The  fact  that  we  obtained  four  time  scales,  two  compartments  and  comparable 
coefficients  q^  (see  Table  2)  for  all  cross-linking  aldehydes  suggests  that 
a  similar  mechanism  of  inactivation  takes  place  in  every  case. 


TABLE  2 
Coefficients 


k 

0 

1 

2 

3 

f.a. 

.98 

.054 

-.155 

-.115 

c.a. 

.98 

.090 

-.126 

-.092 

g.a. 

.99 

.033 

-.194 

-.150 

:4 
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FLUCTUATIONS  IN  MEMBRANES 

Various  biosystems  exhibit  electrical  oscillations  over  bundles  of  cells 
in  nerve  and  muscle  tissues,  intermittent  bursting  or  beating  domains  of 
behaviour.  Similar  phenomena  have  been  observed  in^neural  networks  and  in 
artificial  membrane.  For  instance,  quasi -periodicity  of  relaxation  type  in 
the  interface  electric  potential  of  an  oil/water  system  in  the  presence  of 
surfactants  has  been  reported  (ref.  11).  Self-excitation  and  intermittences 
also  characterize  porous  doped  membranes  (ref.  12).  We  refer,  in  the  following, 
to  such  porous  membranes  due  to  their  similarity  with  biomembranes. 

A  simple  model  governing  the  mass  -transport  through  a  pore  is: 

dy 

—  =  -ky  (13) 


where  y(t)  is  the  concentration,  k  is  the  transport  coefficient,  and  t  is  the 
time. 

Such  a  model  cannot  explain  the  intermittent  behaviour  observed  in  natural 
and  artificial  membranes. 

An  approach  based  on  an  NA  formalism  is  now  considered.  We  suppose  that, 
due  to  extremely  non-linear  interactions  characterizing  the  transmembrane 
transport,  a  self-organization  of  movements,  that  is  a  hierarchy  of  roll-cells 
as  shwon  in  Fig.  1,  appears  in  the  doped  pore.  Different  levels  in  this 
hierarchy  are  indexed  by  m  =  0,  1  and  2.  Making  a  choice  of  the  unit  time  we 
consider  that  the  transport  time  is  2m  for  a  cell  of  the  type  m. 

On  account  of  the  mechanism  of  transport  tnrough  such  structured  pores, 

the  time  t  and  the  coefficient  of  transport  k  are  considered  as  elements  of  a 

2-series  field  (ref.  13).  Denote  these  elements  by  T  and  K,  respectively.  We 

have  T  =  £  t-2J  and  K  =  Ik.2J  where  j  is  a  natural  number  and  t.  and  k.  are  the 
j  J  i  J  J  J 

digits  0,1, 

Observe  that  a  hierarchical  tree  as  shown  in  Fig.  2  could  be  associated  in 
a  one-to-one  correspondence  to  the  hierarchy  of  roll-cells.  The  lower  ends  of 
the  branches  of  this  .tree  represent  the  possible  times  (000,  100  ,  010  and 
so  on).  They  are  denoted  using  precisely  the  dyadic  notation  (for  instance  the 
time  Oil  signifies  0.2°  +  1.21  +  1.2^). 

The  dyadic  expansions  give  detailed  descriptions  of  the  time  and  of  the 
coefficient  of  transport  by  cellular  motions.  The  addition  "  © "  and  the  multi¬ 
plication  "®"  in  this  2-series  field  will  be  defined  by: 

T®K  =  ?<VkjW2j  <14> 

J  J 


The  addition  rule  is  justified  by  the  fact  that  two  identical  steps  (at 
the  same  level  in  tne  hierarchy)  should  correspond  to  the  situation  in  which 
no  transport  takes  place  that  is  to  the  null  time.  For  instance  the  forward 
step  001,  denoted  by  f  in  Fig.  1, followed  b.y  the  backward  step  001,  denoted 
by  b,  gives  according  to  (14  )  001  0  001  =  000. 

The  multiplication  rule  "®"  is  in  fact  a  natural  rule  of  coupling  the 
transport  potentialities  k.  and  durations  t.. 

J  J 

The  solution  of  the  model  (13)  is  in  this  2-series  field: 

y(T)  =  exp  (-K®  T)  (15) 

Table  3  presents  such  solutions  for  0  ^  K,  T  ^  7.  Notice  that  by  simple 

linear  transformations  the  solutions  reduce  to  the  well-known  Walsh  functions 

(ref.  14)  defined  by: 

m-l 

L  k  *t 

Wm(K,T)  =  (-1)**  3  d  (16) 

with  K  =  k  2°  +  ...  +  km  12m'1,T  =  t  2°  +  ...  +  t  12ra'1,  M  =  2m  (in  Table  3, 
o  m-l  *  o  m- i  v 

M  =  8) .Consequently,  the  general  calculated  solution  of  (i3)  is  written  as: 

»cm  ■  X  %  V- 11  (17> 

The  Walsh-Fourier  coefficients  q^  depend  on  the  contribution  of  tne 
sequence  K  (see  ref.  1.4  ).  They  could  be  obtained  from: 

M-l  ?  M-l 

qK  l  W/(K,  T)  =  I  y(T)  W  (K,  T)  (18) 

K  T=0  n  T=0  " 

where  y(T)  is  the  experimentally  recorded  fluctuation. 

CONCLUSIONS 

The  coefficients  q  obtained  in  both  NA  models  developed  here  could  be 
interpreted  in  the  form  of  a  spectrum  giving  a  measure  of  the  contribution  of 
different  scales  involved  in  the  process. 

The  nearly-exponential  and  the  intermittent  relaxations  studied  here  repre¬ 
sent,  in  our  opinion,  suitable  candidates  for  physically  modelling  the  biolo¬ 
gical  drug  action,  the  neural  networks  and  other  biosystems  of  practical 
interest. 

Note  that  the  main  point  of  the  present  analysis  is  that  the  time  is  consi¬ 
dered  to  pertain  to  some  NA  frames.  This  allows  to  describe  a  large  class  of 
"exotic"  relaxations  using  very  simple  models  such  as  (1)  and  (13).  However 
some  of  the  manifestations,  of  the  NA  methods  are  new  and  it  is  still  difficult 
to  obtain  physical  insight  into  them. 
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SUMMARY  . 

Molecular  dynamics  simulations  of.a  small  protein,  barnase,  in  presence  of  explicit 
water  molecules  are  described  and  results  from  a  120  ps  trajectory  of  the  system  are 
analyzed. 

The  deviations  of  the  average  protein  conformation  from  the  starting  crystal  struc¬ 
ture  is  small  (-1.1  Arms  for  backbone  atoms),  and  the  agreement  between  computed 
and  crystallographic  atomic  fluctuations  is  satisfactory  for  portions  of  the  protein  that  do 
not  participate  in  crystal  contacts.  Other  parameters  such  as  the  accessible  area  of  the 
protein  and  its  molecular  volume  change  by  no  more  than  5%  and  2%  respectively.  The 
structure  of  water  around  polar  and  non-polar  groups  on  the  protein  surface  also  seems 
reasonable  in  that  it  agrees  well  with  previous  observations  made  in  hydrated  crystals  or 
in  simulations  of  small  solvated  molecules. 

The  thrust  of  our  study  concerns  the  use  of  the  generated  trajectory  to  obtain  a 
detailed  microscopic  description  of  electrostatic  interactions  in  the  protein.  Contributions 
_  to  these  interactions  from  permanent  protein  dipoles,  from  orientable  solvent  dipoles, 
'  and  from  electronic  polarizability  are  evaluated.  Although  the  analysis  must  still  be 
qualified  as  preliminary,  a  number  of  clear  trends  emerge.  The  contribution  of  water  to 
local  fields  inside  the  protein  is  substantial.  In  exposed  parts  of  the  protein,  it  affects 
/ield  magnitudes  and  field  orientation  respectively  by  about  60%  and  43  degrees  on  the 
average.  In  juried  regions,  where  residues  expose  less  than  10%  of  their  surface  to 
solvent,  this  contribution  is  reduced  to  12%  (in  field  magnitude)  and  20  degrees  (in  field 
orientation)  where  it  becomes  comparable  to  the  contribution  from  electronic 
polarizability.  The  latter  averages  13%  in  field  magnitude  and  12  degrees  in  field  orien¬ 
tation,  and  is  not  sensitive  to  solvent  exposure.  Detailed  analysis  of  both  solvent  and 
electronic  induction  effects,  shows  that  they  display  an  appreciable  degree  of  in¬ 
homogeneity  throughout  the  protein  matrix,  suggesting  that  their  relative  importance 
may  vary  quite  dramatically  according  to  the  local  environment. 


INTRODUCTION 

Electrostatic  interactions  are  among  the  most  important  factors  in  determining  the 
structure  and  function  of  proteins.  They  are  involved  in  enzymic  mechanisms  (1),  al¬ 
losteric  control  (2),  specific  ligand  binding  (3-4)  as  well  as  in  other  essential  phe- 
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nomena  such  as  protein  stability  (5)  and  folding  (6).  It  is  therefore  important  to  be 
able  to  evaluate  correctly  electrostatic  energies  and  forces  acting  upon  atoms  using 
available  X-ray  structures  of  proteins  or  conformations  generated  by  computer  simu¬ 
lations.  This  is  however,  a  difficult  problem-owing  to  the  inherent  complexity- of  these 
interactions  for  solvated  macromolecules,  and  owing  also  to  the  long  range  of  electro¬ 
static  forces  relative  to  the  forces  associated  with  other  interactions.  The  major  bottle¬ 
neck  lies  with  the,  proper  representation  of  dielectric  or  screening  effects  associated 
with  the  orientable  permanent  dipoles  of  surrounding  solvent  molecules  and  with  the 
inherent  many-body  nature  of  electronic  polarizability.  In  present  computational  pro¬ 
cedures,  two  main  approaches  are  being,  used  to  represent  electrostatic  interactions:  a 
microscopic  approach,  usually-  incorporated  in  Molecular  Mechanics  and  Molecular 
Dynamics  simulations,  in  which  electrostatic  interactions  for  both  the  protein  and  sol¬ 
vent  components  are  calculated  by  summing  pairwise  interactions  between  permanent 
dipoles  and  charges  whose  positions  in  space  are  known,  and  a  macroscopic  approach 
based  on  continuum  electrostatic  models  and  which  requires  solving  the  Poisson- 
Boltzmann  equation.  A  microscopic  approach  in  which  a  detailed  representation  of 
the  protein  atoms  and  explicit  solvent  molecules  is  associated  with  Molecular  Dynam¬ 
ics  simulations  to  describe  the  time  dependent  behaviour  of  the  system,  might  in  prin¬ 
ciple  be  expected  to  provide  an  adequate  description  of  electrostatic  properties.  But, 
while  this  approach  is  potentially  quite  accurate,  it  is  in  practice  computationally  over¬ 
whelming.  Due  to  the  heavy  computational  requirements,  many  approximations  have 
been  introduced  resulting  mostly  in  hybrid  models  that  incorporate  mixed  micro¬ 
scopic-macroscopic  representations.  In  these,  solvent  effects  are  mimicked  by  distance 
dependent  dielectric  functions  (7),  or  by  Langevin  type  dipoles  (8).  In  more  recent 
studies,  explicit  solvent  molecules  have  been  included,  but  contributions  from  elec¬ 
tronic  polarizability  have  not  (9-11).  The  macroscopic  continuum  approaches,  on  the 
other  hand,  involve  the  simplifying  assumption  that  protein  interiors  are  homogeneous 
low  dielectric  media,  surrounded  by  solvent  of  high  dielectric  constant.  First  intro¬ 
duced  by  Tanford  &  Kirkwood  in  1957  (12),  they  have  been  recently  generalized  to 
non  spherical  geometries  with  the  implementation  of  numerical  procedures  for  solving 
Poisson-Boltzmann  Equation  (13-15),  to  improve  the  treatment  of  solvation  energies 
(16)  and  to  include  effects  of  ionic  strength  (17).  Recent  results  obtained  by  these 
methods  on  evaluating  changes  in  pKa  of  protein  groups  (18-19)  and  in  redox  poten¬ 
tials  (20)  have  been  encouraging.  But  the  main  limitations  of  the  approach,  in  par¬ 
ticular,  the  validity  of  the  uniform  dielectric  model  for  the  protein  interior,  and  the 
precise  influence  of  the  non  physical  description  of  the  protein/solvent  interface,  are 
still  not  well  understood. 

With  constant  increase  in  available  computer  power,  the  obstacles  encountered  in 
implementing  detailed  microscopic  representations  of  large  molecular  systems  are 
gradually  receding.  It  becomes  possible  now  to  envisage  extensive  testing  of  such 
models  in  computational  procedures  and  confront  them  with  experimental  data.  In  a 
recent  study  (21),  we  described  the  implementation  of  a  microscopic  model  for  evalu- 


ating  electrostatic  interactions  in  proteins  which  incorporates  electronic  polarizability 
effects.  Following  the  approach  of  . a. number  of  authors  (22-23),  isotropic 
polarizabilities  have  been  assigned  to  individual  atoms,  and  resulting  departure  from 
pairwise  interactions  have  been  treated,  by.  a  self  consistent  iterative  procedure.  This 
procedure  was  incorporated  in  Molecular  Mechanics  calculations,  and-  methods  for  its 
practical  use  in  Molecular  Dynamics  simulations,  were,  proposed.  Moreover,  the  contri¬ 
bution  of  electronic  polarizability  to  electrostatic  potential  energies,  but  more  particu¬ 
larly  to  local  electrostatic  fields  and  to  dipoles  moments  of  structural  elements  in  the 
protein  were  thoroughly  analyzed.  The  study  did  however  not  include  contributions 
from  solvent  molecules,  or  from  protein  flexibility,  since  computations  were  per¬ 
formed  only  on  static  protein  crystal  structures  in  vacuum.  Here,  we  extend  the  analy¬ 
sis  to  include  both  contributions,  and  present  highlights  of  our  initial  results.  Molecu¬ 
lar  Dynamics  simulations  of  a  small  protein  barnase  in  presence  of  explicit  water 
molecules  are  used  to  generate  molecular  trajectories  of  the  system.  Although  detailed 
analysis  of  the  trajectories  is  still  in  progress,  evidence  is  presented  that  it  provides  a 
very  satisfactory  description  of  the  protein  and  water  portions.  The  generated  trajecto¬ 
ries  are  then  used  to  obtain  a  detailed  microscopic  description  of  electrostatic  interac¬ 
tions  in  the  protein.  Contributions  to  these  interactions  from  permanent  protein  di¬ 
poles,  from  orientable  solvent  dipoles,  hereafter  referred  to  as  solvent  polarization, 
and  from  electronic  polarizability  are  analyzed. 

MOLECULAR  DYNAMICS  SIMULATIONS  OF  BARNASE  IN  WATER 

A  time  dependent  trajectory  of  the  protein  barnase  in  presence  of  explicit  solvent 
molecules  is  computed  by  solving  Newton’s  equations  of  motion  for  each  particle  in 
the  system,  with  forces  evaluated  as  the  negative  gradient  of  a  classical  empirical  po¬ 
tential  energy  function. 

The  simulations  are  performed  using  recently  implemented  vectorized  procedures 
in  the  BRUGEL  package  (24)  in  the  microcanonical  ensemble  (N,  V,  E),  at  room 
temperature  (<T>  =  304  K).  The  force-field  is  derived  from  a  recent  version  of  the 
CHARMM  potentials  that  includes  explicit  contributions  from  aliphatic  hydrogens  in 
the  protein  portion.  The  solvent  is  modeled  by  the  three-center  charge  TIPS  model 
(25)  with  atomic  charges  q(O)  =  -  0.814  and  q(H)  =  +  0.407  esu,  and  van  der  Waals 
parameters  e  =  -  0.1521  kcal/mol  and  o  =  3.1506  A.  Electrostatic  interactions  are 
modeled  by  the  usual  Coulomb  potential  with  a  dielectric  constant  e  =  1.  Effects  due 
to  induced  dipoles  are  not  included  in  generating  the  trajectories  but  applied  after¬ 
wards  to  computed  conformations  in  a  perturbation-like  approach.  The  starting  con¬ 
formation  of  the  system  consisted  of  one  of  the  three  molecules  (molecule  C)  in  the 
asymmetric  unit  from  the  2  A  resolution  refined  crystal  structure  of  barnase  (26), 
crystallographically  determined  water  positions  located  within  4  A  of  a  protein  atom, 
and  randomly  oriented  water  molecules  placed  on  a  cubic  lattice  in  a  rectangular  box 
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(dimensions  :  49.68  Ax  37:16  A  x  49.68  A).  The  system  contains  8777  atoms. 
These  include  1700  protein  atoms,  comprising  all  hydrogen  positions,  generated  using 
standard  bond  distances  and  angles  (27),  94  water  molecules  positioned  crystal- 
lographically,  and  2265  generated  waters.  With  the  applied  boundary  conditions,  at¬ 
oms  of  proteins  in  adjacent  boxes  are  separated  by  at  least  three  water  layers.  Rele¬ 
vant- simulation  data  and  parameters  are  summarized  in  Table  i. 


TABLE  1 

Protein-Solvent  Simulation  conditions 


Thermodynamic  ensemble  : 

Microcanonical  (N,V,E) 

Integration  algorithm  : 

Verlet  (40), 

Constraints*  : 

protein  :  bond  distances 

water:  bond  distances  and  bond  angles 

Integration  time-step: 

0.002  ps 

Long-range  interactions: 

7  A  Cutoff  distance 

Shifting  Function: 

MEI4** 

Periodic  boundary: 

box  dimensions:  49.68  X  37.16  X  49.68  A 

Thermalization: 

(velocity  rescale),  4  ps 

Equilibration  : 

10  ps. 

Simulations  have  been  performed  using  the  BRUGEL  package  (24)  and  code  vec¬ 
torized  to  run  on  the  Convex-Cl  computer.  Constraints*  are  applied  using  the  Shake 
procedure  (41)  and  MEI4**  is  the  shifting  function  previously  tested  using  Integral 
Equation  methods  (28)  (see  text). 

Particularly  noteworthy  is  the  chosen  treatment  of  long-range  (Coulomb)  interac¬ 
tions.  This  treatment  consists  in  applying  a  7  A  radial  atom-atom  cutoff  distance  to¬ 
gether  with  a  modified  Coulomb  interaction  active  over  the  entire  distance  range: 


U'(f)  =  U(r).S(r) 


(1) 


With  U’  and  U  being  respectively  the  modified  and  unmodified  Coulomb  potential, 
and  r  the  interatomic  distance.  S(r)  is  the  MEM  shifting  function  (28)  which  has  the 
following  form: 

S(r)  =  1  -  2(— )  +  (— )2  r<rc  (2) 
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With  rc  being  the  cutoff  distance. 
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In  a  recent  study  (29),  several  truncation  s9herr.es  for  long-range  interactions, 
have  been  calibrated  against  the  EwaldrlCornfeld-summation  method'  (30)  in  simula¬ 
tions  of  pure  liquid  water.  It  has  been  shown  that  the  MEMfunction  performs  best 
with  three-center  models  such  as  SPC  and  TIPS', -both  with  respect,  to  structural  and 
thermodynamic  properties,  hence  the  choice  of  MEM  for  the  present  protein-water 
simulations  in  which  the  water  is  expected  to  play  an  important  role. 

Analysis  of  simulation  results 

Results  reported  here  are -based  on  a  120  ps  (1  ps  =  10-12  )  trajectory  'f  the  sys¬ 
tem  described  above.  According  to  usual  criteria  such  as  convergence  of  the  average 
potential  and  kinetic  energy'components,  size  of  the  energy  fluctuations  (<  1%)  and 
convergence  in  rms  deviations  from  the  crystal  structure,  the  system  appears  to  be  at 
equilibrium. 

Pfotein  conformation  and  atomic  fluctuations 

The  average  conformation  of  the  protein  was  computed  from  the  120  ps  trajectory 
'and  compared  to  the  starting  .crystal  structure  using  coordinate  superpositions  (31). 
The  computed  rms  deviations  of  backbone  atoms  is  1.13  A  and  that  of  all  the  atoms 
in  the  structure  (including  aliphatic  hydrogens)  is  1.46  A.  This  is  the  lowest  value 
reported  so  far  for  simulations  of  solvated  proteins  where  rms  deviations  of  Co;  atoms 
usually  range  around  1.8-1 .9  A,  except  for  the  recently  reported  simulation  by  Levitt 
(11)  where  rms  deviation  of  similar  size  (1.18  A)  have  been  obtained  for  non-hydro- 
gen  atoms  of  Bovine  Pancreatic  Tryspin  Inhibitor  in  a  200  ps  MD  simulation.  Prelimi¬ 
nary  results  on  a  detailed  comparison  between  the  protein  conformation  averaged 
over  the  120  ps  trajectory  and  the  crystal  structure  show  that  structural  parameters 
such  as  backbone  dihedral  angles,  and  H-bonding  interactions  are  in  general  very 
well  conserved.  The  correspondence  between  the  B-factors  (also  called  temperature 
factors  (32))  of  barnase  computed  from  the  120  ps  MD  simulation  and  those  obtained 
from  X-ray  data  is  illustrated  in  Fig.  1.  It  is  in  general  satisfactory,  particularly  in 
regions  not  involved  in  crystal  contacts,  where  actual  values  of  B-factors  are  often 
closely  similar.  Regions  involved  in  crystal  contacts  display  on  the  other  hand  impor¬ 
tant  discrepancies  in  computed  versus  crystal  B-factors,  with  the  latter  being  always 
smaller.  While  increased  flexibility  in  such  regions  can  be  expected  to  occur  upon  ex¬ 
posure  to  solvent,  the  large  magnitude  of  some  of  the  observed  differences  suggests 
that  other  phenomena,  such  as  incomplete  conformational  relaxation  in  these  regions, 
may  be  at  play.  Other  parameters  such  as  the  protein  accessible  surface  area  and 
molecular  volumes  -  computed  using  an  analytic  algorithm  implemented  in  the 
BRUGEL  package  (Alard  et  al.,  in  preparation)  -  have  also  been  monitored  along  the 
trajectory  (Fig.  2).  After  120ps,  the  accessible  surface  area  and  the  molecular  volume 
of  barnase  show  only  a  modest  changes,  respectively  by  -5%  and  -2  %  (values  based 
on  comparison  of  the  final  spot  structure  with  the  crystal  structure).  While  it  may  be 
premature  to  make  definitive  conclusions,  our  results  suggest  that  the  behavior  of  the 
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protein  part  in  presence  of  water  is  very  satisfactory.  In  order  to  assess  if  this  should 
be  attributed  to  artefacts  in  the  force-field  (which  would  act  to  curb  substantial  de¬ 
partures  from  the  crystal  structure)  or  to  beneficial  effect  from  adequately  modeled 
surrounding  solvent  molecules,  results  from  this  simulation  should  be  compared  to 
those  from  vacuum  simulations  (presently  in  progress). 


•  O 

A  B-factors  averaged  on  main  chain 


Residue  number 


Fig  1:  Comparison  of  B-factors  of  Ca  atoms  of  barnase  computed  from  the  120  ps 
simulation  and  from  the  refinement  of  the  2A  resolution  X-ray  structure  of  barnase. 
The  ordinate  shows  B-factor  magnitudes  in  units  of  A  2.  The  abscissa  shows  residue 
numbers  in  barnase.  Residues  that  bury  10  A  2  or  more  of  surface  area  in  an  interac¬ 
tion  with  another  molecule  in  the  crystal  of  barnase  are  displayed. 
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Fig  2:  Accessible  surface  areas  (a)  and  molecular  volume  (b)  of  barnase  conforma¬ 
tions  along  the  120  ps  molecular  dynamics  trajectory. 

Accessible  areas  and  molecular  volumes  have  been  computed  using  an  analytic  algo¬ 
rithm  implemented  in  BRUGEL  (Alard  et  al,  in  preparation). 

Water  structure  around  the  protein 

Molecular  Dynamics  simulations  can  provide  a  very  accurate  picture  of  the  struc¬ 
ture  and  dynamics  of  v/ater  molecules  close  to  and  in  contact  with  the  protein  sur¬ 
face.  A  number  of  recent  studies  (9-1 1)  are  already  contributing  to  change  our  view 
from  that,  provided  by  X-ray  and  Neutron  diffraction  studies,  where  hundreds  of 
molecules  are  considered  to  be  statically  bound  to  the  protein,  to  that  in  which  water 
at  the  protein  surface  is  generally  not  very  different  from  bulk  water,  implying  that 
many  fewer  water  molecules  are  truly  immobilized  on  the  protein  surface.  Here  we 
present  preliminary  results  on  the  analysis  of  water  structure  at  the  surface  of  bar¬ 
nase  as  seen  in  our  simulation.  Radial  distributions  of  water  oxygens  and  hydrogens 
around  specific  protein  atoms  in  non-polar,  polar  and  charged  groups,  have  been 
computed  in  exposed  side-chains  as  determined  from  solvent  accessibility  calculations 
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(34).  A  sample  of  these  distributions  is  shown  in  Fig.  3(a-d).  Positions  of  maxima  in 
the  radial  distribution  functions  involving  polar  and  charged  protein  groups  are  within 
the  expected  H-bonding  distances  of  the  groups  involved.  A  good  example  is  the  dis¬ 
tribution  of  water  oxygens  around  the  oy  group  of  asn-22  shown  in  Fig.  3a.  The 
maximum  of  this  distribution  occurs  at  2.63  A ,  in  good  agreement  with  results  from 
previous  simulation  (35).  Further  analysis  of  water  structure  around  polar  groups, 
shows  that  acceptor-proton  distances  are  significantly  shorter  when  the  proton  belongs 
to  a  water  molecule  than  when  it  belongs  to  the  protein  as  seen  for  example,  from 
the  peak  positions  of  g(r)  involving  the  "nOH  group  of  Tyr-17  in  barnase  (Fig. 

3(c-d)).  Since  we  evaluate  the  Coulomb  energy  to  be  roughly  the  same  in  both  cases 
this  may  be  due  to  the  lack  of  van  der  Waals  parameters  for  the  water  hydrogens  in 
the  TIPS  model  used  here.  In  addition  we  find  that  generally,  proton-acceptor  dis¬ 
tances  in  our  simulation  are  shorter  than  those  found  in  small  hydrated  crystals  (36), 
but  the  reasons  for  this  are  presently  not  clear.  Radial  distributions  of  water  atoms 
around  non-polar  groups  have  also  been  analyzed.  The  peak  in  the  water  oxygen  dis¬ 
tribution  around  the  non-polar  methyl  group  of  ala-32  in  barnase  (Fig.  3b)  occurs  at 
3.5  A ,  a  somewhat  shorter  distance  from  those  obtained  previously  in  simulations  of 
dilute  solutions  of  alkanes  and  peptides:  3.6  A  and  3.7  A,  respectively  for  the  meth¬ 
ane-water  first  peak  (37),  and  for  both  the  butane-water  (38)  and  peptide- 
methyl-water  (35)  first  peaks. 

A  quite  useful  pictorial  representation  of  water  structure  around  specific  protein 
side-chains  can  be  obtained  by  representing  water  molecules  from  individual  snap¬ 
shots  along  the  trajectory  in  a  common  local  reference  frame  attached  to  the  corre¬ 
sponding  side-chains,  examples  of  such  representations  were  given  during  the  oral 
presentation,  but  are  not  included  here  since  they  require  colour.  Further  analysis  of 
the  structure  of  water  at  the  protein  surface,  as  well  as  a  detailed  study  of  the  dy¬ 
namic  properties  of  water  as  a  function  of  its  distance  to  the  protein,  is  in  progress. 
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Fig  3 :  Radial  distribution  functions  g(r)  for  water  atoms  around  solvent  exposed 
groups  of  barnase. 

fa)  g(r)  of  water  oxygens  (Ow)  and  the  Oy  atom  in  the  carbonyl  group  of  Asn-22  in. 
(b)  g(r)  of  water  oxygens  and  the  C{3  atom  of  Ala- 32.  (c)  g(r)  of  water  hydrogens 
and  the  Op  group  of  Tyr-17,  and  (d)  of  water  oxygens  and  the  Hp  group  of  tyr-17. 


ELECTROSTATIC  INTERACTIONS  IN  A  SOLVATED  FLEXIBLE  PROTEIN 

Contributions  from  electronic  polarizability 

Having  convinced  ourselves  that  the  simulations  yield,  at  least  to  a  first  approxi¬ 
mation,  a  reasonable  description  of  the  protein  and  water  portions,  we  proceed  to 
analyze  the  contributions  from  electronic  polarizability  as  a  first  step  towards  obtain¬ 
ing  a  detailed  microscopic  description  of  the  different  contributions  to  electrostatic 
interactions  in  solvated  proteins,  which  is  the  main  purpose  of  our  study.  The  contri¬ 
butions  of  induced  dipoles  to  electrostatic  interactions  in  static  protein  crystal  struc¬ 
tures  in  vacuum  have  been  analyzed  previously  (21).  It  has  been  shown  that  induced 
protein  dipoles  affect  appreciably  local  electrostatic  fields  in  magnitude  and  direction 
in  a  manner  that  is  strongly  influenced  by  the  microscopic  environment  in  the  pro- 
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tein.  But  since  effect  of  solvent  and  protein  flexibility  were  not  considered  then,  the 
possibility  remained  that  induction  effects  and  their  inhomogeneity  have  been  overes¬ 
timated.  This  is  one  of  the  question  we  address  here  by  analyzing  the  effects  of  pro¬ 
tein  and  solvent  dynamics  on  the  contribution  from  induced  dipoles  to  local  fields  in¬ 
side  the  protein.  For  that  purpose,  averages  and  fluctuations  of  induced  dipoles  and 
of  local  fields  at  atomic  positions  are  computed  in  conformations  generated  during 
the  simulation.  This  perturbation-like  approach  is  not  a  substitute  to  the  correct  but 
more  time  consuming  procedure  in  which  induced  dipoles  effects  are  included  in  the 
simulations.  It  is  used  here  only  as  a  first  approximation  to  obtain  indications  on  pos¬ 
sible  trends. 

To  calculate  induced  dipoles  and  their  contributions  to  local  fields  we  use  proce¬ 
dures  previously  described  (21)  where  isotropic  polarizabilities  are  assigned  to  individ¬ 
ual  atoms  and  the  resulting  deviation  from  pairwise  interactions  is  treated  by  a  self 
consistent  iterative  procedure. 

In  this  formalism,  the  total  local  field  acting,  say,  on  a  protein  atom  i  is  ex¬ 
pressed  as  follows: 

£,  =  #,+  5“,  (3) 


The  first  term  Eq ) ,  on  the  right  hand  side,  is  the  electrostatic  field  on  atom  i  due  to 
all  the  partial  charges  (or  permanent  dipoles)  of  the  system,  excluding  itself: 
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The  second  term  in  Eqn  (3),  £“,■  is  the  electrostatic  field  acting  on  the  ith  dipole  (or  . 

f 

atom)  due  to  all  the  induced  dipoles  in  the  system  excluding  itself:  j 

&i—  1  4-<n-3(rm)jh  w  ! 

j  *  /  ~ij  j 

With  fij  =  aj  Ej  ,  o-j  the  scalar  atomic  polarizability,  and  0/  the  interatomic  dis-  1 

tance.  Values  of  atomic  polarizabilities  are  taken  from  the  CHARMM  library  (39). 

Interactions  between  atoms  separated  by  2  covalent  bonds  or  less  are  not  considered.  i 

Time  averages  and  fluctuations  of  the  relative  orientations  and  sizes  of  induced 
fields  versus  fields  due  to  partial  charges  Eq  ,  are  computed  at  atomic  positions 

in  protein  conformations  generated  during  the  120  ps  trajectory.  We  find  that  the  dis¬ 
tributions  of  the  atomic  time  averaged  ratio  of  field  magnitudes  <  &1, >  remain  j 

very  similar  for  the  same  trajectory  of  barnase  in  presence  (Fig.  4a),  and  in  absence  ) 

(Fig.  4b)  of  surrounding  water  molecules  (computed  using  the  same  distance  cutoff  j 
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values  and  shifting  function),  with  mean  values  for  these  ratios  of  0.27  and  0.30  re¬ 
spectively.  These  distributions  resemble  moreover  closely  the  one  obtained  for  the  ra¬ 
tio  E^jEq  in  the  static  crystal  structure  of  barnase  in  vacuum  (Fig.  4c),  where  the 
mean  is  0.31.  This  indicates  that  protein  flexibility  on  the  one  hand,  and  presence  of 
surrounding  water  molecules  on  the  other  do  not  contribute  to  average  out  electronic 
polarizability  effects  inside  the  protein,  at  least  not  at  the  time  scale  of  the  present 
simulation.  This  is  further  confirmed  by  the  distributions  of  the  atomic  averages  of 
the  relative  orientations  between  induced  and  permanent  field  vectors,  shown  in  Fig. 
5a-c.  These  distributions  have  been  computed  for  barnase  conformations  along  the 
same  MD  trajectory  in  presence  and  in  absence  of  solvent,  and  for  the  static  crystal 
structure  in  vacuum.  All  three  are  equally  broad,  and  follow  the  shape  of  a  random 
distribution.  This  however,  should  not  be  taken  to  imply  that  water  has  little  influence 
on  induced  fields  in  individual  locations  in  the  protein.  Indeed,  we  show  below  that 
water  has  an  appreciable  effect  on  local  Coulomb  fields  (fields  due  to  partial 
charges).  This  in  turn  affects  induced  dipoles.  Fig.  6a  shows  the  distribution  of 

atomic  time  averaged  ratios  <  p+w/£“ p  >  ,  where  EM p  ,  and  EM p*w  are  the  magnitudes 
of  local  induced  field  respectively  in  absence  and  in  presence  of  water  molecules 
computed  along  the  120  ps  trajectory.  On  the  average,  the  solvent  increases  the  mag¬ 
nitude  of  induced  dipoles  by  -13%  (mean  of  the  distribution),  and  changes  the  orien¬ 
tation  of  the  induced  dipole  component  by  an  average  of  45  degrees  (Fig.  6b),  with 

time  dependent  fluctuations  of  5%  for  the  ratio  ptw/E^ P  ,  and  of  25  degrees  for  the 

angle  p  for  which  the  distributions  are  not  shown. 

A  better  measure  of  the  influence  of  electronic  polarizability  can  be  obtained 
from  the  analysis  of  its  effect  on  the  total  local  field.  Distributions  of  the  time  aver¬ 
aged  ratios  <  E‘/Eq  >  where  E‘  is  the  magnitude  of  the  local  total  electrostatic  field 
(containing  contributions  from  both  permanent  and  induced  dipoles),  and  EE  the  mag¬ 
nitude  of  the  local  field  due  to  permanent  dipoles  alone  (see  also  Eqn  3)  have  there¬ 
fore  been  analyzed.  These  distributions  are  not  shown  here  since  they  closely  resem¬ 
ble  those  previously  obtained  (21).  They  confirm  that  the  average  contribution  from 
electronic  polarizability  to  the  local  total  fields  inside  the  protein  is  small  on  the  aver¬ 
age,  with  the  means  of  the  distributions  ranging  between  7-14%,  but  due  to  a  broad 
tail  of  the  distribution  towards  higher  values,  it  remains  significant  for  an  appreciable 
fraction  of  the  atoms. 


V*-,  . 


afczs gacasgai 


502 


<  E11 1  >  atomic  lime  average  (120  ps)  <  E11  I  £q  >  atomic  time  average  (120  ps) 


Fig  4:  Statistical  analysis  of  the  relative  magnitudes  of  local  electrostatic  fields  due  to 
induced  dipoles  EE  ,  and  to  permanent  dipoles  (or  partial  charges)  Eq  ,  computed  at 
atomic  positions  in  barnase. 

(a)  Distribution  of  the  ratios  between  magnitudes  of  fields  due  to  induced  dipoles, 
and  fields  due  to  partial  charges,  computed  at  atomic  positions  i  and  averaged  over 
the  120  ps  trajectory  generated  by  the  simulation.  Both  EE  and  Eq  include  contribu¬ 
tions  from  surrounding  solvent.  However,  EE  does  not  include  contributions  from  in¬ 
duced  water  dipoles. 

(b)  Same  as  in  (a)  but,  EE  and  Eq  are  computed  from  the  barnase  trajectory  without 
including  contributions  from  surrounding  water  molecules. 

(c)  Distribution  of  the  ratio  EE/Eq  in  the  static  crystal  structure  of  barnase  in  vac¬ 
uum.  All  calculations  are  performed  with  e=l,  full  charges  on  ionizable  groups  and  a 
cutoff  distance  of  8  A  together  with  the  shifting  function  described  in  the  text. 
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Fig  5:  Relative  orientations  between  local  field  components  due  respectively  to  in¬ 
duced  dipoles  and  permanent  dipoles  Eq  in  barnase. 

(a)  Distribution  of  the  average  angle  between  the  field  vectors  computed  from  the  last 
25  ps  of  the  simulation  in  presence  of  water  molecules,  (b)  As  in  (a)  but  water  con¬ 
tribution  to  electrostatic  fields  is  not  included,  (c)  Distribution  of  the  angles  between 
the  same  field  components  computed  in  the  static  crystal  structure  of  barnase  in  vac¬ 
uum  with  computational  parameters  as  for  the  results  presented  in  Fig.  4c. 


Fig  6:  Influence  of  solvent  on  the  induced  dipoles  inside  the  protein,  (a)  Distribution 
of  the  average  ratio  of  induced  field  magnitudes  at  atomic  positions  computed  along 

the  120  ps  trajectory  of  barnase:  p  >  where  the  numerator  is  the  induced 

field  on  an  atom  i  in  presence  of  all  protein  and  solvent  atoms,  while  the  denomina¬ 
tor  is  the  same  field  computed  in  absence  of  solvent,  (b)  Distribution  of  the  average 
angle  between  the  corresponding  induced  field  vectors  described  in  (a). 


504 


Contributions  from  solvent  polarization  to  local  fields  inside  the 
protein 

A  correct  description  of  the  dielectric  properties  and  electrostatic  interactions  in 
solvated  proteins  requires  proper  evaluation  of  contributions  from  surrounding  solvent 
molecules.  These  contributions  have  as  o.'gin  the  highly  polar  character  of  the  water 
molecules  which  will  both  reorient  in  response  to  the  field  generated  by  the  protein 
and  electronically  polarize  their  surroundings.  In  addition  to  these,  of  course,  water 
also  exerts  an  indirect  influence  through  its  effects  on  protein  structure  and  dynamics. 
Our  MD  simulations  performed  with  a  detailed  microscopic  representation  of  both  the 
protein  and  the  water  portions,  together  with  the  procedure  of  computing  induction 
effects,  are  ideally  suited  for  obtaining  a  detailed  picture  of  these  effects.  Data  on  the 
influence  of  water  molecules  on  induced  dipoles  in  the  protein  was  presented  in  the 
previous  section.  Here,  we  turn  our  attention  to  the  contribution  of  surrounding  water 
to  electrostatic  fields  due  to  partial  charges  inside  the  protein.  To  evaluate  this  contri¬ 
bution  we  compute  time  averages  of  the  ratios  of  electrostatic  field  magnitudes 

< Eqptw/Eqp>  at  atomic  positions  in  the  protein  for  conformations  along  the  generated 

120  ps  trajectory.  and  Eqp  are  respectively,  the  local  field  magnitude  generated 
by  permanent  dipoles  of  the  protein  and  the  surrounding  solvent  and  the  local  field 
magnitude  generated  by  protein  dipoles  alone.  Note  that  effects  of  electronic 
polarizability  are  not  included  here.  The  distribution  of  these  ratios  is  shown  in  Fig. 

7a  and  the  corresponding  standard  deviations  (or  fluctuations)  are  shown  in  Fig.  7b. 
From  the  distribution  in  Fig.  7a  we  see  that  water  increases  the  magnitude  of  local 
fields  at  protein  atoms  by  a  substantial  amount:  28%  on  the  average  (mean  value  of 
1.28  for  the  averaged  ratios  of  field  magnitudes)  with  a  sizable  fraction  of  cases  dis¬ 
playing  increases  of  100%  or  higher!  The  standard  deviations  of  the  same  field  ratios 
are  also  large  with  a  mean  value  of  0.44  (Fig.  7b).  Surrounding  water  is  also  found 
to  have  an  appreciable  effect  on  the  orientation  of  local  fields  due  to  permanent  di¬ 
poles  inside  the  protein.  This  is  illustrated  by  the  distribution  of  the  average  relative 
orientation  between  local  field  vectors  due  respectively  to  permanent  dipoles  of  the 
entire  system  (protein  +  water)  and  to  those  of  the  protein  alone  (Fig.  7c).  This  distri¬ 
bution  has  a  mean  of  32  degrees,  but  it  is  broad  with  a  substantial  tail  reaching  val¬ 
ues  as  high  as  140  degrees,  indicating  that  surrounding  water  can  in  some  cases 
nearly  reverse  the  direction  of  the  local  field.  These  results  confirm  the  very  impor¬ 
tant  contribution  of  surrounding  water  to  electrostatic  fields  inside  the  protein,  and 
that  the  major  part  of  this  contribution  is,  as  expected,  in  the  component  due  to  per¬ 
manent  water  dipoles.  The  broad  shapes  of  the  computed  distributions  and  prelimi¬ 
nary  attempts  to  relate  them  to  structural  parameters  lead  us  to  assume  that  solvent 
contributions  would  not  be  evenly  distributed  throughout  the  protein  matrix.  Structural 
features  of  the  protein,  degree  of  exposure  to  solvent  as  well  as  structural  and/or  dy¬ 
namic  properties  of  surrounding  water  molecules  may  strongly  influence  these  contri- 
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butions.  Indeed,  preliminary  analysis  shows  (see  Table  2)  that  in  exposed  parts  of  the 
protein  (atoms  that  expose  more  than  90%  of  their  surface  to  solvent)  the  contribu¬ 
tion  from  water  polarization  to  the  magnitude  and  orientation  of  local  fields  reaches 
respectively  60%  and  43  degrees  on  the  average.  However,  in  buried  regions  (atoms 
that  expose  less  than  10%  of  their  surface  to  solvent),  this  contribution  is  reduced  on 
the  average  to  12%  (in  field = magnitude)  and  20  degrees  (in  field  orientation).  On  the 
other  hand,  the  contribution- from  electronic  polarizability  which  averages  13%  in  field 
magnitude  and  12  degrees  in  field  orientation,  is  found  to  be  insensitive  to  the  degree 
of  exposure  to  solvent.  Hence,  electronic  and  solvent  polarization  effects  are  (on  the 
average)  of  comparable  size  and  importance  in  buried  regions,  while  in  exposed  re¬ 
gions  solvent  polarization,  as  expected,  clearly  dominates. 


<  K^(p+w)  /  E®(p)  > 


a 


— l — }..„-,Tr7=^n  . i  c — i .... , -t . . ■—  i . — . . . 
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<  a  (  E^(p+w)  /  E^(p)  )  >  <  Angle  <  E^<P+"')  .  K^(p)  )  >  (  degrees) 

b  c 

Fig  7:  Contributions  of  water  molecules  to  local  fields  due  to  permanent  dipoles  and 
charges  in  barnase.  (a)  Distribution  of  the  average  ratios  between  field  magnitudes 

EVw ,  the  local  field  due  to  the  permanent  dipoles  of  protein  and  water,  and 

&p  the  local  field  due  to  permanent  dipoles  of  the  protein  alone.  Fields  are  com¬ 
puted  at  atomic  positions  and  averaged  along  the  entire  120  ps  trajectory  of  the  simu¬ 
lation.  (b)  Distribution  of  the  fluctuations  of  the  ratios  described  above  about  the  av¬ 
erage  values,  (c)  Distribution  of  the  average  angles  between  the  local  field  vectors 
described  in  (a). 
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TABLE  2 


Electrostatic  interactions  in  barnase:  Contribution  -from  induced  dipoles  and  solvent 


Flexible 
protein 
in  vacuum 

Flexible 
protein 
in  solvent 

Rigid 

crystal 

protein 

Angle  E',Eq 

(a) 

12.1  (6.4) 

9.1  (5.4) 

(b) 

13.1  (6.6) 

12.4  (6.5) 

all 

12.7  (6.7) 

11.5  (6.5) 

12.9(14) 

Angle  W,  Eq 

(a) 

91.9 

(24.1) 

91.2 

(27.6) 

(b) 

87.9 

(23.7) 

84.3 

(24.9) 

all 

90.8 

(23.6) 

89.1 

(26.0) 

89  (43) 


Angle  Eqp>w,  Eqp 


(a) 

(b) 
all 


42.7  (18.4) 
20.2  (10.7) 
32.1  (15.2) 


Angle 


(a) 

(b) 
all 


54.7  (25.3) 

32.8  (17.8) 
45.2  (21.8) 


E'/Eq 


(a) 

1.12  (0.40) 

1.12  (0.44) 

b) 

1.12  (0.38) 

1.16  (0.41) 

all 

1.13  (0.40) 

1.14  (0.45) 

1.07  (0.30) 

W/ Eq 


(a) 

0.28 

(0.13) 

0.22 

(0.12) 

(b) 

0.31 

(0.15) 

0.30 

(0.15) 

all 

0.30 

(0.14) 

0.27 

(0.14) 

0.31  (0.30) 

1.56  (0.63) 
1.12  (0.26) 
1.28  (0.44) 


1.15  (0.51) 
1.08  (0.34) 
1.13  (0.50) 


120  ps  MD  simulations  of  Barnase,  July  1989 

The  table  summarizes  the  results  obtained.  It  lists  mean  values  and  fluctuations  (in 
parentheses)  of  computed  atomic  averages  of  quantities  for  which  the  distributions 
are  shown  in  Fig.  4-7. 

The  first  column  gives  the  expression  of  the  computed  quantity  using  the  same  con¬ 
ventions  as  in  the  text  and  in  the  Figures.  Columns  2  and  3  list  results  obtained  from 
distributions  computed  with  barnase  trajectories,  respectively  in  absence  and  presence 
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of  water  molecules.  Column.  4  .lists  results  obtained  from  computations  performed  on 
the  static  crystal  structure  of  barnase  in  vacuum,  (all)  designates  values  obtained  us¬ 
ing  all  atoms -  in  the  protein,  (a)  designates  values  obtained  for  atoms  that  expose 
more  than  90%  of.  their  surface  to  solvent  (a  total  of  392  atoms),  and  (b)  values  ob¬ 
tained  for  buried  atoms  (446  in- number)  -  with  less  than  10%  of  their  surface  acces¬ 
sible  to  solvent . 

Note  that  the  computed  standard  deviations  (given  in  parentheses  in  column  4)  con¬ 
cern  spatial  fluctuations  and  not  time  dependent  fluctuations  as  in  columns  2  and  3 
and  are  therefore  of  a  different  nature. 


CONCLUDING  REMARKS 

In  this  account,  recent  results  have  been  presented  on  120  ps  Molecular  Dynam¬ 
ics  simulations  of  a  small  protein  barnase  in  presence  of  explicit  water  molecules. 
Although  detailed  analysis  of  the  trajectories  is  still  in  progress,  evidence  has  been 
presented  that  it  provides  a  reasonable  description  of  the  protein  and  water  portions. 
The  deviations  of  the  average  protein  conformation  from  the  starting  crystal  structure 
is  reasonably  low  (-  1.1  Arms  for  backbone  atoms),  and  the  agreement  between 
computed  and  crystallographic  atomic  fluctuations  is  satisfactory  for  portions  of  the 
protein  that  do  not  participate  in  crystal  contacts.  The  structure  of  the  water  around 
polar  and  non-polar  groups  on  the  protein  surface  also  seems  reasonable  in  that  it 
agrees  well  with  previous  observations  made  either  in  hydrated  crystals  or  in  other 
simulations  of  small  solvated  molecules. 

The  thrust  of  our  study  concerned  the  use  of  the  generated  trajectories  to  obtain 
a  detailed  microscopic  description  of  different  contributions,  including  electronic 
polarizability,  and  solvent  effects  to  electrostatic  interactions  in  a  solvated  protein.  Ta¬ 
ble  2  summarizes  the  highlights  of  the  results  presented  here.  A  number  of  clear 
trends  emerge.  The  contributions  of  surrounding  water  to  electrostatic  fields  due  to 
permanent  dipoles  inside  the  protein  is  substantial.  It  affects  field  magnitudes  and 
field  orientations  respectively,  by  28%  and  32  degrees  on  the  average.  In  comparison, 
the  contribution  from  electronic  polarizability  alone  is  much  lower  with  an  average  of 
-10%  in  field  magnitude,  and  -12  degrees  in  field  orientation.  However,  analysis  of 
how  these  effects  are  distributed,  shows  that  they  display  an  appreciable  degree  of 
inhomogeneity  throughout  the  protein  matrix.  This  is  confirmed  by  the  finding  that 
solvent  polarization  effects  are  are  on  the  average  twice  the  sizes  of  induced  dipole 
effects  in  exposed  regions,  while  in  buried  regions  the  two  contributions  are  (on  the 
average)  of  comparable  size.  Other  factors  such  as  the  precise  local  constellation  of 
polar  and  non  polar  groups  should  further  modulate  the  relative  importance  of  elec¬ 
tronic  polarizability  versus  water  polarization.  Further  analysis  is  in  progress  to  illus¬ 
trate  this  point. 

Our  study  represents  substantial  progress  towards  obtaining  a  detailed  micro¬ 
scopic  description  of  electrostatic  interactions  in  proteins.  But  it  is  only  a  first  step  in 
an  on-going  effort,  since  many  problems  still  need  to  be  overcome.  Simulation  time 
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scales  need  to  be  extended,  possibly  to  the  nanosecond  range,  several  parallel  simula¬ 
tions  should  be  performed  to  reduce  bias  from  initial  conditions,  and  induced 
polarizability  must  be  included  in  the.  Molecular  Dynamics  simulations,  not  to  men¬ 
tion,  the  need  to  rederive  force-fields  and  energy  parameters  which  are  physically 
consistent  with  a  detailed  microscopic  description. 

As  computational  power  is  increasing  exponentially,  our  ability  to  tackle  these 
problems  is  in  constant  progress.  We  hope  to  be  able  to  produce  better  trajectories  of 
solvated  proteins  in  the  near  future,  and  to  extract  from  them  improved  descriptions 
of  electrostatic  properties  which  we  intend  to  confront  with  those  obtained  by  other 
methods  such  as  the  Langevin  dipole  treatment  for  water  (8)  and  various  macroscopic 
continuum  approaches  (13)  (17). 
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DISCUSSION 


SURCOUF  -  When  you  performed  your  calculations  on  the  crystal  structure,  did  you 
take  into  account  the  water  molecules  of  the  crystal  structure  ? 

WODAK  -  The  Molecular  Dynamics  simulations  included  94  water  molecules  whose 
positions  were  determined  crytallographically  as  well  as  2265  additional  water 
molecules  whose  starting  positions  were  generated  on  a  cubic  lattice  dimensioned  to 
yield  unit  density. 

Calculations  performed  on  the  static  crystal  structure  of  barnase  did  not  include  any 
water  molecules,  as  the  purpose  of  these  calculations  has  been  to  analyze  the  intrinsic 
properties  of  the  protein  matrix. 


ANGYAN  - 1  should  like  to  comment  the  implementation  of  the  induced  dipoles  model 
in  MD  simulation.  The  most  difficult  problem  is  the  calculation  of  forces.  One  has 
essentially  two  probabilities  that  works  : 

a)  do  not  iterate  at  all :  then  the  forces  do  nol  include  the  effect  of  field  gradient  of  the 
induced  moments  on  induced  moments. 

b)  iterate  up  to  self-consistency  :  then  the  above  term  appears  in  the  force  expression. 
Nevertheless  one  should  be  extremely  careful  in  the  third  case,  in  the  not-fully-iterated 
case,  because  there  is  a  risk  of  non-conservation  of  the  energy.  The  reason  is  that  for 
not  self-consistent  induced  moment  the  "obvious"  force  formula  is  not  the  exact 
derivative  of  the  energy  expression. 

WQDAK  -  Indeed,  using  an  iterative  self-consistent  procedure  to  compute  induced 
dipoles  and  fields  is  equivalent  to  numerical  computations  of  energy  derivatives  as 
opposed  to  exact  analytic  procedures,  and  can  therefore  cause  energy  conservation 
problems.  Very  stringent  convergence  criteria  must  be  used  to  prevent  such  problems. 
Better  ways  to  incorporate  self-consistent  effects  which  do  not  suffer  from  these 
shortcomings  require  modifying  the  equations  of  motions  in  ways  that  do  not 
significantly  affect  the  actual  dynamics  of  the  system. 
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PEPE  -  Are  the  electrostatic  effects  all  related  to  a  choice  of  charge  distribution  ? 


WODAK  -  Yes,  they  are  indeed.  We  have  shown  in  our  published  study  (Van  Belle  et 
al,  1987)  that  the  use  of  different  values  of  electrostatic  charges  on  ionizable  groups  in 
proteins  influences  the  size  and  directions  of  induced  dipoles  to  an  extent  which  is 
comparable,  or  larger  in  magnitude  than  the  induction  effect  itself.  It  may  be  expected 
that  the  influence  from  changing  the  values  of.  permanent  dipoles  and  charges  should 
be  reduced  in  presence  of  solvent  due  to  screening  effects,  but  the  relevant 
computations  need  still  to  be  done. 


SOUMPASIS  -  In  the  MD  calculations  you  use  a  cutoff  of  7  A  (for  computational 
convenience)  and  compare  the  result  with  the  static  calculation  where  there  is  no 
cutoff,  why  don't  you  use  the  same  cutoff  in  both  calculations  ? 

WODAK  -  The  published  work  on  the  analysis  of  induction  effects  in  proteins  reported 
results  obtained  using  an  infinite  cutoff  distance  for  including  pairwise  interactions.  But, 
in  the  present  account,  where  results  from  calculations  performed  on  computer 
generated  trajectory,  and  on  the  static  crystal  structure  are  compared,  the  same  cutoff 
distance  and  switching/shifting  potential  for  long-range  interactions  have  been  used. 


BUCKINGHAM  -  You  indicated  that  the  electric  field  arising  from  water  molecules  tend 
to  be  perpendicular  to  the  field  of  the  permanent  dipoles  of  the  protein.  This  contrasts 
with  the  Onsager  reaction  field  model  in  which  the  field  due  to  the  polarized 
environment  is  parallel  to  the  primary  dipole. 

WODAK  -  There  must  be  a  misunderstanding.  I  indicated  that  the  local  field 
components  arising  from  induced  electronic  polarizability,  and  not  the  fields  arising 
from  the  presence  of  water  molecules,  tend  to  be  perpendicular  to  the  local  fields 
generated  by  the  permanent  dipoles  of  the  system. 


KWIATKOWSKI  -  Several  scientific  groups  try  to  construct  atom-atom  potentials  (in  fact, 
these  potentials  depend  on  the  choice  of  several  parameters).  You  have  used  in  your 
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calculation  the  parameters  from'CHARMM.  To  what  extent  your  quantitative  predictions 
would  be  changed  if  you  would  use  another  potential  ? 


WODAK  -  The  molecular  trajectories  generated  by  Molecular  Dynamics  will  obviously 
depend  on  the  energy  functions  and  values*of  energy  parameters.  The  results  of  the 
detailed  electrostatic  calculations  will  more  particularly  depend  on  values  of  partial 
charges  and  on  values  of  atomic  polarizabilities  (as  shown  in  our  published  work  -  Van 
Belle  et  al.,  1988,  rthe  dependence  on  the  former  is  more  dramatic).  We  believe 
however,  that  the  general  conclusions  reached  here,  namely,  (1)  that  contributions  to 
electrostatic  fields  from  surrounding  solvent,  are  on  the  average  three  times  the  size  of 
contributions  from  electronic  polarizability  and  (2)  that  the  relative  contributions  from 
solvent  and  electronic  polarizability  are  highly  dependent  on  the  local  environment, 
would  hold,  when  we  use  other  force-fields  representative  of  protein  structures. 


DURUP  - 1  could  not  see  whether  your  pictorial  description  of  water  molecules  allows 
to  characterize  the  features  of  the  first  and  second  water  layers,  bulk  water,  water 
clusters,  etc. 

WODAK  -  The  pictorial  representation  of  water  structure  around  specific  protein  groups 
(side-chains)  allows  a  visual  analysis  mainly  of  the  first  layer  of  water  molecules,  and 
that  only  in  a  relatively  limited  local  region. 


PERAHIA  -  Did  you  analyze  the  variation  of  your  statistics  when  you  use  trajectories 
started  with  different  initial  conditions  ?  I  ask  this  question  because  the  electrostatic 
energy  is  sensitive  to  ample  motions,  and  most  of  them  are  not  necessarily  present  in  a 
single  trajectory  of  1 20  ps. 

WODAK  -  Being  severely  limited  by  computer  time  we  did  not  have  the  possibility  so 
far  to  generate  more  than  just  one  trajectory,  although  we  are  much  aware  that  data 
from  multiple  trajectories  should  be  very  useful. 


SQUMPASIS  -  What  is  the  best  effective  dielectric  constant  you  would  suggest  for  a 
protein  in  water  on  the  basis  of  your  MD  work  ? 


n 
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WODAK  -  This  question  cannot  be  readily  answered  at  present.  Our  calculations  have 
shown  that  neither  e  =3  nor  e=rij  (where  rij  is  the  distance  between  a  pairs  ot  atoms 
i  and  j)  can  properly  reproduce  electrostatic  energies  computed  using  a  detailed 
atomic  representation  (using  e=1  and  including  polarizability  etfects).  A  similar 
analysis  has  not  as  yet  been  performed  in  presence  ot  surrounding  water. 

Most  importantly,  our  analysis  demonstrates  that  the  protein  matrix  is  rather 
heterogeneous  electrostatically  and  argues  strongly  against  the  use  of  the  same 
effective  dielectric  constant  throughout. 
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INTRODUCTION 

Establishing  a  clear  correlation  between  protein  structure  and  function  is 
one  of  the  major  goals  in  experimental  as  well  as  theoretical  molecular  biology.  It 
is  becoming  widely  recognized  that  the  concept  of  electrostatic  free  energy  plays 
a  key  role  in  translating  a  particular  structural  pattern  to  specific  functional 
properties  (1-4).  The  efforts  from  our  laboratory  in  the  past  15  years  have  to  a 
large  extent  been  invested  in  a  microscopic  verification  of  this  point  of  view  (see 
for  example,  ref.  1,  5,  6-11).  These  studies,  which  will  be  briefly  reviewed  in 
what  follows,  involve  the  development  of  a  simplified  microsciopic  model  [which 
provided  the  first  semi-quantitative  way  for  studying  the  energetics  of  solvated 
proteins  (1,5)]  as  well  as  more  advanced  approaches  that  have  become  practical 
upon  the  emergence  of  modem  powerful  computers.  TV  implementation  of  these 
strategies  in  studies  of  several  different  types  of  protein  functions  is  reviewed  in 
the  subsequent  sections. 


EVALUATION  OF  ELECTROSTATIC  FREE  ENERGIES  IN 
SOLVATED  PROTEINS 

In  order  to  understand  biological  processes  on  a  molecular  level  it  is  crucial 
to  somehow  be  able  to  estimate  relevent  electrostatic  free  energies.  This  could  not 
readily  be  accomplished  by  macroscopic  models  since  these  inevitably  must  rely 
on  phenomenological  dielectric  constants  (whose  relation  to  the  actual 
microenvironment  is  not  given  by  macroscopic  considerations).  The  point  of  view 
taken  in  this  laboratory  quite  early,  was  to  avoid  the  macroscopic  formulation 
altogether  and  instead  attempt  to  follow  a  route  based  on  microscopic 
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considerations.  The  enormous  dimensionality  of  electrostatic  problems  in  proteins 
and  the  long-range  nature  of  these  effects  have  motivated  the  development  of 
simplified  models  (1,5),  which  still  capture  the  major  contributions  to 
electrostatic  energies.  The  Protein  Dipoles  Langevin  Dipoles  (PDLD)  model  is  an 
example  of  this  type  of  approach,  and  it  is  based  on  the  working  hypothesis  that 
the  electrostatic  energy  of  the  average  structure  of  the  system  is  a  good 
approximation  for  the  electrostatic  free  energy  of  the  system.  The  average 
structure  of  the  protein  is  usually  taken  as  the  experimentally  determined 
conformation  from  X-ray  crystallography,  and  the  contributions  from  groups  of 
the  protein  to  the  electrostatic  energy- are  evaluated  on  a  microscopic  level  where 
permanent  residual  charges  and  induced  dipoles  are  taken  into  account.  The 
solvent  around  the  protein  is  treated  by  considering  its  average  polarization  as 
given  by  the  Langevin  dipole  model  and  representing  the  solvent  molecules  by  a 
grid  of  Langevin  type  dipoles  [see  (12)  for  details]. 

The  philosophy  behind  this  simplified  water  model  is  as  follows.  The 
average  polarization  of  any  given  water  molecule  near  an  ion  is  related  in  some 
way  to  the  field  from  the  ion.  If  we  knew  the  distribution  function  for  this 
average  polarization  we  could  evaluate  the  ion  solvation  free  energy  by  summing 
the  scalar  products  of  the  average  dipole  at  each  grid  point  and  the  field  from  the 
ion  at  this  point,  and  then  adding  the  free  energy  of  polarizing  the  solvent  dipoles 
(this  polarization  costs  about  half  of  what  is  gained  from  the  field-dipoles 
interactions).  Fortunately  one  can  determine  the  polarization  of  water  molecules 
as  a  function  of  the  applied  field  from  microscopic  simulations  (lc)  (and/or  by 
refining  the  model  by  fitting  calculated  and  observed  solvation  energies  (5))  and 
represent  the  resulting  polarization  function  by  a  scaled  Langevin-type  of 
expression.  By  surrounding  the  solute  (or  protein)  with  a  grid  of  polarizable 
dipoles  obeying  the  expected  solvent  polarization,  it  appears  that  free  energies 
associated  with  solvation  can  be  reasonably  well  produced  without  the  need  for  an 
all  atom  water  model.  The  PDLD  scheme,  which  treats  both  the  solvent  and  the 
protein  polarization  explicitly,  thus  provides  a  simple  and  effective  way  of 
assessing  electrostatic  energies  in  solvated  proteins. 

With  the  recent  advances  in  computing  power  it  has  become  possible  to 
sample  the  phase  space  of  highly  multidimensional  systems  by  direct  simulation 
approaches  such  as  MC  and  MD  methods.  Consequently,  one  can  explore  formal 
statistical  mechanical  calculations,  which  earlier  had  been  beyond  reach,  but  now 
can  be  employed  (with  serious  convergence  problems)  even  for  systems  of 
considerable  size  (9,10,13-16,26,36,38).  With  these  so  called  "free  energy 
perturbation"  (FEP)  methods  (17a,b)  one  can  directly  evaluate  free  energy 
differences  from  MC  or  MD  trajectory  calculations.  This  is  usually  done  by 
introducing  a  mapping  parameter  (controlling  the  effective  potential  surface  in 
the  simulation)  with  which  the  system  can  be  gradually  driven  from  a  particular 
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reference  state  to  a  state  whose  free  energy  one  wishes  to  calculate  (such  an 
approach  was  first  introduced  for  ion  solvation  problems  in  ref.  13a  and  for 
enzymes  in  ref.  13b).  In  this  type  of  calculation  it  has  become  practically  possible 
to  treat  a  large  part  of  the  system  microscopically,  by  explicit  all-atom 
representations.  An  inherent  difficulty,  however,  in  performing  such  microscopic 
simulations  on  large  molecular  systems  is  associated  with  the  selection  of 
appropriate  boundary  conditions.  In  this  respect,  the  use  of  periodic  boundary 
conditions  in  combination  with  simple  truncation  of  interactions  beyond  a  certain 
range  or  in  combination  with  .Ewald  summation  has  become  the  most  popular 
prescription.  However,  it  is  clear  that  in  some  cases  the  introduction  of  artificial 
periodicity  or  symmetry  in  the  calculations  can  lead  to  rather  unphysical 
representations  of  the  relevant  electrostatic  fields  [for  an  excellent  discussion  of 
these  problems,  see  ref.  (17c)]. 

Considering  the  above  mentioned  problems  we  have  concentrated  on 
developing  non  periodic  models.  In  the  first  attempt  to  evaluate  solvation  free 
energies  in  a  charge  transfer  reaction  by  an  all-atom  solvent  model  using 
FEP/MD  simulations  (13),  the  dimensionality  of  the  problem  was  reduced  by 
applying  surface  constraints  [introduced  originally  in  the  SCSSD  model  (18)]  to 
the  solvent  molecules.  This  was  done  by  using  a  limited  number  of  water 
molecules  surrounded  by  a  spherical  surface  layer  of  molecules  that  represents 
the  effect  of  the  surrounding  in  a  corresponding  infinite  system.  A  more  refined 
model  incorporating  explicit  dynamical  polarization  constraints  was  subsequently 
reported  (19),  referred  to  as  the  Surface  Constraint  All  Atom  Solvent  (SCAAS) 
model.  This  model  uses  a  limited  number  of  waters  in  order  to  create  a  sphere  of 
solvent  around  the  relevant  groups  (the  radius  of  this  sphere  is  typically 
10-15  A).  Angular  and  radial  constraints  are  incorporated  in  order  to 
compensate  for  the  artificial  surface  created  as  a  result  of  using  a  finite  number 
of  water  molecules.  The  extension  of  the  SCAAS  model  to  free  energy 
calculations  of  solvated  proteins  [and  its  use  as  a  substitute  for  the  earlier  PDLD 
(1,5)  and  SCSSD  (18)  models]  is  quite  straightforward  (20).  The  SCASS  model 
has  been  used  extensively  in  this  laboratory  in  recent  years  and  appears  to  provide 
an  efficient  way  to  carry  out  microscopic  simulations  of  solvated  proteins  and 
chemical  processes  in  solution.  However,  despite  its  ability  to  overcome  some  of 
the  problems  associated  with  imposing  artificial  symmetry  on  the  system,  the 
method  still  requires  the  truncation  of  long-range  interactions  in  order  to  reduce 
computational  costs.  Unfortunately,  when  treating  larger  systems  it  is  of  great 
importance  to  use  a  large  cutoff  radius  for  water-water  interactions  in  order  to 
account  for  long-range  electrostatic  correlations.  A  step  towards  this  dilemma  is 
the  recently  developed  extension  of  the  Ewald  method  to  non-periodic  systems 
(21),  which  may  allow  for  the  practical  implementation  of  SCAAS  systems  of 
much  larger  size  than  the  ones  currently  used.  At  any  rate,  we  presently  find  ( by 
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using  calculations  of  intrinsic  pKa's  as  a  general  bench  mark)  that  the  accuracy  of 
FEP  calculations  is  still  not  much  better  that  that  of  the  PDLD  method  (in  some 
cases  the  accuracy  is  less  than  that  of  the  PDLD  model).  This  might  reflect  the 
fact  thaLpresent  simulation- times  allow  us  to  sample  an  extremely  small  part  of 
the  relevant  phase  space  and  of  course  the  above  mentioned  long  range  problem. 

SIMULATIONS  OF  ENZYME  CATALYSIS 

Our  studies  of  enzymatic  reactions  have  been  largely  based  on  the  notion 
that  the  difference  between  enzyme  and  solution  reactions  is  mainly  determined 
by  the  corresponding  surroundings  of  the  reacting  systems  and  not  so  much  by 
the  actual  quantum  mechanics  of  the  "solute"  atoms.  Apparently,  purely  quantum 
mechanical  approaches  to  enzyme  catalysis,  that  do  not  incorporate  the 
environmental  effects  from  the  protein  into  the  solute  hamiltonian,  may  lead  to 
rather  irrelevant  results.  These  considerations  led  to  development  of  SCF-MO 
approaches  (5)  (MINDO-2  and  QCFF/ALL)  which  combined  the  microscopic 
potential  of  the  protein  and  its  surrounding  solvent  into  the  quantum  mechanical 
hamiltonian.  In  later  stages  of  these  studies  (6,13)  it  was  concluded  that  the 
Valence  Bond  (VB)  method  had  better  chances  of  producing  conclusive  results 
(since  it  can  easily  be  calibrated  with  solution  experiments  in  a  much  more  unique 
way  than  other  methods).  This  led  to  the  development  of  the  Empirical  Valence 
Bond  (EVB)  method  which  we  will  be  only  briefly  review  here. 

The  introduction  of  the  EVB  scheme  for  studies  of  enzymatic  reactions  was 
based  on  the  observation  that  many  chemical  reactions  can  be  described  in  terms 
of  elementary  "resonance  structures"  (RS)  that  specify  the  charges  and  bonding  of 
the  reacting  atoms  at  different  VB  configurations  ;  e.g.  a  heterolytic  bond 
cleavage  is  described  by  (X-Y)  and  (X*  Y+).  Furthermore,  the  different  RS  can 
be  conveniently  represented  by  standard  analytical  force  fields  possibly 
augmented  by  gas-phase  quantum  mechanical  calculations.  With  this  type  of 
description  one  can  explore  chemical  reactions  in  enzymes  by  evaluating  the 
energetics  of  the  relevant  RS  in  solution  and  in  the  protein’s  active  site  (the  EVB 
method  is  easily  implemented  within  the  PDLD  as  well  as  the  FEP/MD 
framework).  For  instance,  a  proton  transfer  reaction  in  an  enzyme  can  be 
examined  by  considering  the  energetics  of  the  two  RS  (A-H  B)  and  (A*  H-B+).  If 
the  protein  is  designed  to  catalyse  such  a  reaction  it.  will  stabilize  the  (A-  H-B+) 
state  relative  to  (A-H  B)  more  than  water  does.  Similar  treatments  can  be 
introduced  to  study  more  complicated  enzymatic  reactions  (see  ref.  10  for  a 
detailed  description). 

The  EVB  approach  has  been  applied  to  several  important  families  of 
enzymes  (1,6,8-10,26).  In  the  cases  where  comparisons  were  made  between  the 
PDLD  and  the  FEP/MD  treatments,  the  two  methods  gave  similar  results.  For 
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|  example,  let  us  consider  the  interesting  case  of  the  Asp  32  — >  Ala  mutation  in 

|  ;  subtilisin  (11).  This  mutation  which  removes  the  catalytic  Asp  32  leads  to  a  four 

f  \  orders  of  mgnitude  reduction  in  the  reaction  rate  (22).  The  calculations  reported 

|  in  ref.  11  on  this  mutation  are  summarized  in  ref.  11.  The  main  effect  of  the 

i.  catalytic  is  reproduced  by  both  the  PDLD  and  FEP/MD  methods.  Both 

|  calculations  indicate  that  Asp  32  contributes  to  catalysis  by  electrostatic 

stabilization  of  the  transition  state  and  not  by  the  double  proton  transfer 
\  mechanism  depicted  in  many  textbooks.  The  calculations  above  as  well  as  related 
I  studies  on  other  enzymes  lend  support  to  the  proposal  that  enzymes  catalyse  their 
reactions  by  providing  active  site  environments  which  are  electrostatically 
r.  complementary  to  the  change  in  charge  distribution  of  the  reacting  system 
associated  with  forming  the  relevant  transition  states  (1,6,10,11). 

|  The  importance  of  electrostatic  energy  is  particularily  dramatized  in  metal 

\  catalysed  reactions,  where  the  enzyme  often  reduces  activation  barries  by  more 
I,  than  10  kcal/mol.  An  illustrative  example  of  this  is  provided  by  the  enzyme 
\  staphylococcal  nuclease  (SNase)  which  catalyses  the  hydrolysis  of  both  DNA  and 
|  RNA  at  the  5'  position  of  the  phosphodiester  bond  (23-25) : 

5  H20  +  5’  -  0P(02)  O  -  3’  <=»  5*  -  OH  +  (0H)P(02)  0  -  3'  (1) 

r 

t 

I  EVB  studies  of  this  system  (26)  reproduce  the  overall  observed  catalytic  effect  of 

f  the  enzyme  and  demonstrate  the  crucial  role  of  the  metal  charge  in  stabilizing  the 

hydroxide  ion  and  reducing  the  energetics  of  proton  transfer  from  water  to  the 
|  enzyme  general  base.  The  calculations  also  reproduce  and  explain  the  effect  of 
1  metal  substitution  on  the  corresponding  catalytic  activity.  This  indicates  that  the 
]  EVB  method  captures  the  first  order  effects  in  enzyme  catalysis  by  focusing  on 
the  relevant  environmental  effects  (the  change  in  electrostatic  energy  upon 
moving  from  water  to  the  enzyme  active  site). 

ION  CHANNELS 

The  energetics  of  ion  transport  through  membrane  channels  is  a  rather 
complex  problem  since  the  interacting  system  comprises  several  different  phases. 
In  addition  to  the  membrane  spanning  protein(s),  with  internal  solvant  molecules, 
both  the  surrounding  aqueous  and  lipid  phases  contribute  to  the  overall  solvation 
of  a  migrating  ion.  The  most  widely  studied  system  for  channel  mediated  ion 
transport  is  the  Gramicidin  A  (GA)  channel.  Because  the  activation  energies  for 
ion  permeation  (5-7  kcal/mol)  are  known  from  direct  experimental  data  (29,30), 
the  difficulties  associated  with  calculations  of  energy  profiles  along  the  channel 
immediately  becomes  apparent.  Although  continuum  calculations  that  assumed  a 
large  dielectric  constant  inside  the  channel  could  be  adjusted  to  reproduce  the 
observed  barrier  (31,32),  the  results  of  extensive  microscopic  simulation  studies 
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have  been  rather  disappointing,  at  least  as  far  as  activation  barriers  are 
concerned.  This  appears  mainly  to  be  due  to,  the  neglect  of  some  of  the  main 
electrostatic  contributions,  while  treating  other  factors  more  or  less  rigorously 
(33,34);  In  particular,  it  appears  that  models  which  neglect  either  the  bulk  water 
phase  or  the  lipid  membrane -cannot  account  for  the  fairly  high  permeation  rates 
that  are  observed  experimentally  (for  an  overview  of  early  calculations  on  the 
GA  system,  see  ref.  35). 

We  have  recently  reported  free  energy  calculations  on  the  Na+  -GA-water- 
membrane  system  using  both  the  PDLD  and  the  FEP/MD  methods  (36),  which 
appear  to  resolve  the  problems  outlined  above.  In  these  calculations  the 
membrane  surrounding  the  GA  channel  was  represented  by  a  large  grid  of 
polarizable  point  dipoles  (with  no  permanent  dipole  moments),  giving  a  dielectric 
constant  of  em  =  2  for  the  membrane.  By  using  the  PDLD  method  in 
combination  with  energy  minimization  (EM)  a  complete  energy  profile  for  the 
permeation  of  a  Na+  ion  was  calculated. 

It  has  been  found  that  the  calculated  values  for  the  energy  barriers  agree 
fairly  well  with  those  observed  experimentally.  Furthermore,  the  calculations 
indicate  that  the  contribution  to  the  ion  solvation  energy  in  the  channel  from  the 
polarization  of  the  surrounding  membrane  and  (to  a  lesser  extent)  from  the 
induced  dipoles  of  the  GA  helix  can  amount  to  as  much  as  ~10  kcal/mol. 

While  FEP/MD  simulations  for  this  system  would  be  very  expensive  for 
calculating  a  complete  free  energy  profile  (in  particular  at  the  channel  entrances 
where  a  large  number  of  water  molecules  would  have  to  be  included),  it  is  of 
interest  to  perform  such  a  calculation  in  the  interior  of  the  channel.  In  this  region 
only  a  limited  number  of  waters  are  needed  to  account  even  for  rather  long-range 
solvation  effects  and  the  results  can  be  directly  compared  to  those  obtained  with 
the  PDLD/EM  scheme  in  the  centre  of  the  channel.  The  results  of  such 
calculations  (36)  produced  a  difference  of  2.3  kcal/mol  between  the  free  energy 
of  the  ion  in  the  centre  of  the  channel  and  in  water.  This  value  is  similar  to  the 
corresponding  result  from  the  PDLD  free  energy  is  -2  to  +3  kcal/mol  (36). 

The  study  of  ion  solvation  in  the  GA  channel  clearly  illustrates  the  danger 
in  neglecting  certain  parts  of  the  system  for  which  electrostatic  energies  are 
evaluated.  In  this  case,  discarding  the  induced  membrane  polarization  appears  to 
be  equivalent  to  introducing  an  error  of  ~10  kcal/mol  and  it  is  therefore  not 
surprising  that  most  reported  energy  profiles  have  given  too  high  energies  in  the 
interior  of  the  channel. 

ELECTROSTATIC  CONTROL  OF  THE  CHARGE  SEPARATION 
PROCESS  IN  PHOTOSYNTHETIC  SYSTEMS 

The  recent  determination  of  the  X-ray  structure  of  bacterial  reaction 
centres  (37)  provides  an  opportunity  of  probing  the  detailed  molecular  nature  of 
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photosynthetic  processes.  In  particular,  one  can  now  attempt  to  explain  the 
relation  between  the, -protein  structure  and  the  energetics  of  the  primary  charge 
separation  event.  A  major  challenge  in  this  task  is  the  puzzling  directionality  of 
the  electron  transfer  process  in  bacterial  relation  centers.  That  is,  the  X-ray 
structure  reveals  a  bacteriochlorophyll  dimer  P,  which  serves  as  the  primary 
electron  donor,  two  neighboring  accessory  bacteriochlorophylls  (BL  and  BM)  and 
two  bacteripheophytins  (HL  and  HM).  These  chromophores  are  arranged  in  a  i 

seemingly  very  symmetric  manner  in  two  subunits  (L  and  M)  forming  the  (PBH)L  j 

and  (PBH)m  branches.  Despite  this  nearly  symmetric  arrangement  it  appears  that  | 

the  primary  charge  separation  process  occurs  almost  exclusively  via  the  L  branch,  ? 

while  the  electron  transfer  pathway  through  the  M  branch  is  virtually  blocked.  t 

This  finding  offers  an  exciting  opportunity  of  understanding  the  control  of  } 

electron  transfer  pathways  by  protein  microenvironments.  Furthermore,  this  I 

problem  is  a  particularity  useful  test  case  for  various  methods  for  correlating 
structure  and  function  in  proteins.  j 

With  the  X-ray  structure  of  the  reaction  centre  we  can  calculate  the  free  j 

energy  of  the  various  possible  charge  transfer  states  along  the  charge  separation  ■ 

coordinate.  This  can  be  done  by  either  the  PDLD  or  the  FEP  methods  provided, 
however,  that  the  calculations  include  all  the  key  electrostatic  elements  (protein 
permanent  dipoles,  protein  induced  dipoles  and  surrounding  water  and  j 

membrane).  Furthermore,  meaningful  calculations  require  that  one  takes  into  ; 

account  the  energy  of  forming  the  relevant  charged  chromophores  in  the  gas  ; 

phase.  This  energy  (which  is  not  given  in  a  reliable  way  by  current  quantum 
mechanical  methods)  can  be  obtained  by  using  experimentally  observed  redox  and  j 

oxidation  potentials  and  the  energies  of  the  different  charge  transfer  states  in 
solution  (38).  We  thus  use  a  thermodynamic  cycle  and  obtain  the  relationship  ; 

A  Gf  =  AG*  +  AAGJJP  (2)  j 


where  AGjP  is  the  free  energy  of  forming  the  indicated  state  in  its  protein  site, 

w-*-p  * 

while  AG  ilSOi  is  the  change  in  the  "solvation"  free  energy  of  the  given  state  upon 
moving  the  chromophores  from  water  to  their  protein  sites. 

The  PDLD  calculations  of  AGjP  for  the  two  branches  show  a  clear 
preference  for  the  L  branch  with  a  significant  activation  barrier  for  charge 
separation  through  the  M  branch  (39).  These  findings  indicate  that  the  protein 
microenvironment  provides  more  electrostatic  stabilization  to  the  (P+B’)L  state 
than  to  the  (P+B')M  state,  thus  contributing  to  the  control  of  the  charge  separation 
process  and  facilitating  electron  transfer  through  the  L  branch  (additional  control 
of  the  electron  transfer  pathway  may  be  provided  through  the  electronic  coupling 
matrix  elements).  The  calculations  also  indicate  that  few  amino  acids  (in 
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particular  Tyr  M208)  are  responsible  for  the.  difference  between  the  two 
branches.  This  prediction  can  be  examined  by  site  specific  mutations. 


CONCLUDING  REMARKS 

In  this  paper  we  have  dealt  with  a  number  of  fundamental  biomolecular 
processes  in  which  the  electrostatic  free  energy  of  the  reacting  species  seems  to 
play  a  decisive  role  for  the  overall  activity.  The,  examples  given  also  emphasize 
the  importance  of  reliable  evaluation  of  electrostatic  free  energies  in  any  attempt 
to  examine  the  relationships  between  three-dimensional  structure  and  function  of 
proteins.  We  have  indicated  (and  to  some  extent  demonstrated)  that  calculations  of 
energetics  in  proteins,  in  many  instances,  cannot  become  quantitative  or 
conclusive  if  the  treatment  of  electrostatic  contribution  is  incomplete  (e.g.,  by 
neglecting  surrounding  solvent  or  induced  dipoles).  The  use  of  simplified 
representations  for  some  contributions,  within  a  complete  treatment  of 
electrostatic  effects,  appears  to  be  more  effective  than  using  rigorous 
representations  for  only  part  of  the  system  at  the  expense  of  neglecting  other 
parts  (e.g.,  calculating  a  high  quality  ab  initio  surface  for  reacting  bonds  while 
neglecting  solvent  around  an  active  site,  or  even  the  active  site  itself). 

Other  studies  from  our  lab  that  were  not  described  here  (10,39,40) 
considered  the  role  of  purely  dynamical  effects  on  the  rate  constants  of 
biomolecular  processes  (which  would  manifest  themselves  in  the  magnitude  of  the 
preexponential  transmission  factor).  It  is  found  in  most  cases,  with  the  possible 
exception  of  photobiological  reactions  (39,40),  that  such  dynamical  factors  are  of 
minor  importance  in  comparison  with  the  relevant  free  energy  barriers.  In 
particular,  this  appears  to  be  the  case  for  enzymatic  reactions  which  usually 
involve  significant  activation  barriers  and  therefore  have  a  transmission  factor 
that  does  not  differ  substantially  from  unity. 

The  developments  reviewed  here  and  in  studies  from  other  laboratories  in 
this  exciting  field  are  surely  not  the  last  word  in  simulation  of  complicated 
biological  processes.  It  seems  clear  that  the  enormous  advances  in  computer 
technology  will  continue  to  enable  changes  in  the  simulation  strategies  used,  and 
lead  to  yet  more  sophisticated  and  reliable  methods.  Nevertheless,  we  believe  that 
more  refined  pictures  of  biomolecular  processes  that  eventually  will  emerge  will 
sustain  the  ideas  about  the  key  role  of  electrostatic  free  energies. 
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DISCUSSION 


DURUP  - 1  feel  there's  a  problem  with  your  model  for  SN2  reactions  in  solution.  You 
essentially  are  building  diabatic  states  (your  "resonant  states")  and  then  diagonalizing 
H  to  get  an  adiabatic  curve.  Now  if  you  would  do  molecular  dynamics  and  run  a 
reactive  trajectory,  you  wouid  find  a  time  of  the  order  of  10-13  sec  for  this  process, 
whereas  .the  time  necessary  for  the  water  molecules  to  rearrange  is  of  the  order  of 
10'11  sec.  Thus  I  think  the  actual  activation  energy  should  be  somewhere  between  the 
adiabatic  one  and  the  one  you  obtain  from  the  crossing  between  diabatic  states.  I'm 
aware  that  you  are  not  computing  activation  energies  but  free  energies  of  reaction,  but 
is  it  enough  to  remove  the  difficulty  ? 

WARSHEL  -  Sorry  but  my  treatment  which  is  quite  complicated  is  at  present  the  most 
rigorous  treatment  of  the  solute-solvent  dynamic  coupling.  Please  read  J.  Chem.  Phys., 
1988  and  J.  Am.  Chem.  Soc.  1988,  US,  5297.  Not  only  that  we  include  the  solvent  in 
the  solute  Hamiltonians  but  we  developed  an  approach  based  on  linear  response 
theory  to  simulate  the  solvent  and  solute  dynamics.  We  have  a  treatment  for  both  the 
diabatic  and  adiabatic  limit  and  the  S^2  is  entirely  within  the  adiabatic  limit,  also  the 
solvent  dynamics  is  in  the  IQ-13  limit.  Please  read  our  detailed  study  of  this  in  J.  Am. 
Chem.  Soc.  m  715(1987). 


RIVAIL  -  The  V.B.  approach  which  you  presented  reminds  me  of  a  proposal  of  W.  Miller 
to  approximate  the  height  of  the  potential  energy  barriers. 

WARSHEL  - 1  am  not  aware  of  the  exact  work  but  clearly  more  and  more  people  are 
realizing  the  power  of  the  simple  diabatic  representation. 


PERAHIA  -  The  Langevin  dipole  model  misses  the  correlation  effects  between 
neighboring  water  molecules.  A  dielectric  constant  of  80  is  obtained  when  such 
correlations  are  considered.  What  are  the  implications  of  this  limitation  of  the  model  on 
your  results  ? 

WARSHEL  -  In  one  of  our  recent  version  of  the  model  we  introduced  the  short  range 
Kirkwood  g  factor  in  the  Langevin  model.  This  goes  beyond  the  mean  spherical 
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approximation.  However,  even  with  the  original  model  that  has  dielectric  constant  of 
around  20  we  are  getting  correctly  the  self  energy  of  charges  in  protein. 


W.IPFF  -  How  did  you  parametrize  the  parameter  of  Ca2+  used  for  modelling 
Staphylococcal  Nuclease  ? 

W.ARSHEL  -  The  parameter  for  Ca2+  are  given  in  Aqvist  and  Warshel,  Biochemistry, 
2S,  4680  (1989). 


-Q.HQJNACKI  -  Could  you  comment  on  the  non-orthogonality  problem  of  your  empirical 
valence  bond  method  ? 

WARSHEL  -  Good  question.  In  principle  we  consider  our  parameters  to  be  obtained 
from  the  V.B.  Hamiltonian  after  Lowdin  orthogonalization  procedure.  Please  see 
eq.  (8)  in  J.  Am.  Chem.  Soc.  11£,  5297  (1988). 
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MODELLING  THE  ACETYLCHOLINE  RECEPTOR  (AChR)  CHANNEL.  ENERGY 
PROFILES  AND  POINT  MUTATIONS. 


A.  PULLMAN,  S.  FUROIS-CORBIN  and  A.M.  ANDRADE 

Institut  de  Biologie  Physico-Chitnique,  13  rue  Pierre  et  Marie 
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SUMMARY 

A  summary  is  given  of  the  development  of  a  molecular  model  of 
the  inner  wall  of  the  AChR  channel  compatible  with  the  presently 
existing  structural  knowledge  (sequence,  stoechiometry,  etc) 
supplemented  by  conditions  derived  from  recent  labelling 
experiments  and  by  theoretical  calculations.  Energy  profiles  are 
shown  to  explain  the  recently  described  effect  of  point  mutations 
on  the  channel  conductance. 


INTRODUCTION  :  THE  MODEL 

In  the  currently  prevalent  concept  of  the  AChR  channel  (1), 
five  pentagonally  disposed  subunits  (a  0  a  7  5)  are  believed  to 
participate  in  the  transmembrane  structure  by  four  hydrophobic 
helices  MI-MIV  and  to  contribute  at  least  one  helix,  Mil,  to  the 
inner  wall  of  the  channel.  Labelling  experiments  (2-5)  with 
competitive  blockers  (NCB) ,  identifying  the  labelled  residues  as 
homologous  serines  in  the  five  Mil  helices,  have  suggested  (6-7) 
that  these  residues  face  the  interior  of  the  pore,  thereby  fixing 
the  orientation  of  the  Mil's  with  respect  to  the  central  axis. 

Adopting  these  premisses  and  using  theoretical  calculations, 
a  molecular  model  of  the  channel  was  built  as  follows  :  it  was 
first  shown  in  a  theoretical  study  (8)  that,  in  order  to  satisfy 
the  suggestion  (3)  that  the  high  affinity  site  for  the  NCB 
chlorpromazine  lies  at  or  near  the  level  of  the  labelled  serines 
on  the  axis  of  quasi-symmetry  of  the  receptor  and  at  minimum 
distances  from  all  five  chains,  consecutive  Mil  helices  must  be 
laterally  in  contact  at  this  level  rather  than  separated  by 
another  helix.  It  was  shown  furthermore  (9)  that  this  hypothesis 
allows  the  calculation  of  the  minimal  distance  of  the  helices  in 
this  region.  Moreover,  adopting  the  reasonable  assumption  that  the 
large  permeant  ions  or  NCB's  must  diffuse  through  the  upper  part 
of  the  channel,  it  was  noted  that  the  presence  of  bulky  internal 
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side-chains  in  this  upper  part  of  the  inner  wall  would  prevent  the 
diffusion  if  the  five  Mil  helices  were  parallel  :  using  the 
sequences  of  the  Mil's  (figure  1)  and  assuming  the  Ca's  of  the 
labelled  serines  (numbered  8)  to  face  the  center  of  the  pore,  the 
"helix  wheels"  request  (8)  that  the  other  Ca  carbons  situated  on 
the  interior  wall  are  those  of  residues  1,  4,  12,  15  and  19. 
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Fig.  1.  The  aligned  sequences  (standard  numbering  as  indicated)  of 
the  Mil  segments  of  Torpedo  Marmorata  and  Torpedo  Californica  (18) 
within  the  limits  (between  arrows)  discussed  in  ref.  8  and  9  ;  (*) 

:  homologous  serines  labelled  by  3[H]CPZ  and  3[H]TPMP.  For 
practical  reasons  a  simplified  numbering  is  indicated  on  the  top 
line. 

Residues  15  and  19  which  are  in  the  upper  internal  part  of  the 
pore  are  bulky  valines,  isoleucines  and  leucines.  Conformational 
optimizations  and  model  building  in  this  part  of  the  helices 
allowed  the  determination  of  the  minimal  approach  of  the  Mils  at 
this  level  compatible  with  the  passage  of  chlorpromazine  (9) .  This 
condition,  together  with  that  imposed  at  the  level  of  the  labelled 
serines  impose  to  the  channel  wall  a  truncated  conical  shape  (see 
figure  2) .  The  resulting  tilt  of  the  helices  was  calculated  to  be 
7  degrees  with  respect  to  the  central  vertical  axis  of  the  pore. 
This  model  leaves  small  gaps  between  adjacent  Mil's  in  the  upper 
part  of  the  channel  wall,  which  can,  however,  be  easily  closed  by 
contact  with  another  helix  of  each  subunit,  either  MI  or  Mill  (see 
ref.  9  for  a  discussion) . 

An  important  feature  of  the  model,  discussed  in  ref.  (8) ,  is 
the  inclusion  in  the  a-helical  portion  of  the  Mil  segments  of  the 
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Fig.  2.  A  view  from  the  synaptic  side  of  the  disposition  of  the 
five  Mil  helices  around  the  pore.  The  points  labelled  by  letters 
represent  the  location  of  the  Ca  carbons  of  the  successive 
residues  in  the  a-helical  structure.  Numbering  and  amino  acid  code 
as  in  figure  1.  As  discussed  in  (9)  the  last  residue  is  the  21st 
one  and  residue  22  (not  numbered)  is  outside  the  helical  stretch. 
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charged  glutamates  (Gin  in  7)  situated  two  residues  below  the 
sulfur  amino-acids  generally  considered  as  the  N-terminal  groups. 
Energy  calculations  of  the  energy  profile  of  the  largest  permeant 
ion  dimethyldiethanolammonium  (DMDEA)  showed  that  this  inclusion 
is  fundamental  in  determining  the  possibility  of  exit  of  the  ions 
(8)  .  This  feature  results  from  the  a-helical  structure  which, 
together  with  the  internal  positions  of  the  labelled  serines 
imposes  that  the  negatively  charged  glutamates  (Gin  in  7)  face 
also  the  interior  of  the  pore  while  the  adjacent  positively 
charged  lysines  point  to  the  external  side  of  the  helices,  thus  do 
not  annihilate  the  favorable  action  of  the  negative  residues. 
These  conclusions,  hence  the  underlying  hypothesis,  have  been 
strinkingly  confirmed  by  the  site-directed  mutation  experiments  of 
ref.  (10)  which  showed,  in  particular,  that  mutating  the 
negatively  charged  residues  considered  above  has  a  much  stronger 
effect  on  the  channel  conductance  than  mutating  the  other  anionic 
residues  found  between  MI  and  Mil  or  between  Mil  and  Mill,  while 
mutating  the  positively  charged  residues  following  the  Glu(Gln) 
positions  leads  to  no  significant  change  in  channel  conductance. 
The  dominance  of  the  effect  of  the  negative  residues  at  positions 
1  over  that  of  the  residues  at  positions  22  (see  fig.  l)  is  in 
very  good  agreement  with  the  fact  underlined  in  (9)  that  the  upper 
limit  of  the  Mil  must  be  placed  at  residue  21,  owing  to  the 
presence  of  a  proline  four  residues  ahead  in  the  sequence. 

A  recent  calculation  (11)  of  the  energy  profile  for  a  sodium 
cation  in  the  model  has  confirmed  and  extended  these  previous 
findings.  It  has  furthermore  indicated  the  role,  in  the  profile, 
of  the  other  helices  of  the  subunits.  Thus,  it  was  shown  that  when 
only  the  five  Mil  helices  are  used  to  calculate  a  profile,  the 
energy  (even  though  negative,  thus  favorable)  is  becoming  less  and 
less  favorable  from  top  to  bottom  of  the  pore,  a  situation 
disavantageous  for  the  passage  of  the  cation.  This  reflects  the 
fact  (7,9,11)  that  the  variations  of  the  binding  energy  are 
determined  by  the  variations  of  its  electrostatic  component.  Thus 
due  to  the  large  dipole  moment  (about  74  Debye  units)  of  each  Mil 
a-helix,  the  conical  bundle  made  of  five  nearly  parallel  Mil's 
presents  a  very  large  dipole  moment  with  the  negative  pole 
oriented  towards  the  synaptic  side  and  the  positive  pole  towards 
the  cytoplasmic  side,  a  situation  which  would  be  unfavorable  for 
the  crossing  of  a  positive  cation  if  the  Mil's  were  alone.  But  in 
fact,  in  the  actual  situation,  each  Mil  belongs  to  a  subunit  which 
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comprises  four  helices,  alternately  down  and  up  from  the  synaptic 
side  to  the  cytoplasm,  thus  with  alternately  oriented  dipole 
moments.  Thus,  even  though  the  Mil's  are  the  closest  to  the  pore, 
the  unfavorable  effect  of  their  parallel  dipoles  on  the  passage  of 
a  positive  cation  is  considerably  reduced  by  the  presence  of  the 
other  helices  of  the  receptor  :  it  was  shown  (11)  that  the  effect 
in  this  respect  of  the  three  supplementary  helices  in  each  subunit 
could  be  mimicked  by  supplementing  the  cone  of  Mil's  by  another 
cone  of  analogous  a-helices  antiparallel  to  the  Mil's  and  placed 
in  contact  with  them  just  behind  as  indicated  in  figure  3  (see 
ref.  11  for  the  technical  details) . 


(a)  (b) 

Fig.  3.  Schematic  view  from  the  synaptic  side  of  the  mutual 
disposition  of  two  adjacent  Mil  helices  and  of  the  five  helices  X 
surrounding  them  in  (a)  lower  part  and  (b)  upper  part  of  the 
conical  model  of  the  channel.  Arrows  indicate  the  orientation  of 
the  helices,  from  their  N-  to  their  C-terminal. 

ENERGY  PROFILES  IN  THE  MODEL 

The  "energy  profile"  of  Na+  computed  in  the  double  cone  so 
defined  indicated  that  the  interaction  energy  E  of  Na+with  the 
whole  "double"  bundle  ften  helices^  is  negative,  thus  favorable, 
and  presents  a  favorable  evolution  from  ton  to  bottom  of  the 
channel  (figure  4)  a  result  of  the  now  favorable  evolution  of  its 
electrostatic  component. 
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Fig.  4.  Energy  profile  of  Na+  and  its  components  in  the  double 

cone  of  fig.  2  :  ( )  total  interaction  energy  E  ;  (-  -  -) 

electrostatic  component  of  E  ;  ( . )  polarization  component  of 

E.  The  Lennard- Jones  component  is  very  small,  thus  omitted  in  the 
figure. The  scale  h  indicates  in  A  the  height  of  Na+  in  the  channel 
(from  30,  left  :  synaptic  side  to  0,  right  :  cytoplasmic  side). 
The  arrows  indicate  the  height  of  the  Ca  of  the  residues  facing 
the  interior  of  the  pore  in  Mila  (see  fig.  1  for  the  homologs) . 


It  is  interesting  to  consider  the  effect  on  such  a  profile  of 
representative  mutations  experimentally  tested  (10)  in  the  anionic 
ring  situated  at  the  bottom  of  the  Mil's.  This  is  illustrated 
below  on  two  examples,  namely  the  substitution  of  Glu  255  in  6  by 
Gin  and  of  Glu  241  in  a  by  Asp  (residues  1  in  the  numbering  of 
figure  1) . 

The  calculation  of  the  energy  profile  was  done  as  in  (8, 9, Il¬ 
ia)  by  optimization  of  the  energy  of  interaction  (13)  of  the  ion 
with  the  whole  channel  in  successive  planes  perpendicular  to  the 
central  axis,  regularly  and  closely  spaced.  The  "energy  profile" 
is  the  plot  of  this  interaction  energy  as  a  function  of  the 
progression  of  the  ion.  As  in  (8)  the  channel  is  maintained  rigid. 
The  computations  were  performed  in  the  double  cone  defined  in  ref. 
11  as  recalled  above. 

The  introduction  of  the  mutation  is  made  as  follows  : 
according  to  energy  optimizations  carried  out  in  the  model  channel 
(8,9)  Glu  1  and  Lys  2  form  a  salt  bridge  in  a,  /?,  S  ;  Gin  1  and  Lys 
2  in  7  are  H-bonded.  In  keeping  with  this  standpoint,  for  the 
substitution  of  Glu  by  Gin  in  Mils,  a  conformation  was  adopted 
where  the  new  Gin  forms  with  Lys  2  a  hydrogen  bond  analogous  to 
that  of  Gin  1  and  Lys  2  in  MII7.  For  the  substitution  of  Glu  by 
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Asp  in  Mila,  a  real  salt  bridge  like  that  formed  by  Glu  and  Lys  in 
the  unsubstituted  channel  cannot  be  achieved,  owing  to  the  shorter 
chain  of  the  aspartate,  but  it  appears  reasonable  to  adopt  a 
structure  where  one  of  the  negatively  charged  carboxyl  oxygens  of 
Asp  is  directed  towards  the  positively  charged  end  group  of  Lys  in 
an  elongated  bridge. 

For  simplicity  we  call  W  (wild-type)  the  double  cone  of  the 
model  where  the  sequences  of  the  five  Mil's  are  those  of  Fig.  1, 
ESQ  the  same  double  cone  where  Glu  1  in  Mils  is  replaced  by  Gin 
and  EaD  the  double  cone  where  Glu  1  in  both  Mila's  is  replaced  by 
Asp. 


Fig.  5.  Energy  profiles  of  Na+  in  the  double  cone  of  figure  2.  The 
curves  W  ard  ESQ  in  full  line  are  the  profiles  in  the  wild-type 
channel  and  in  the  ESQ  mutant  respectively  ;  the  dotted  line  is 
the  profile  in  the  EaD  double  mutant.  Scale  as  in  fig.  4. 

Fig.  5  compares  the  profiles  calculated  in  W,  ESQ  and  EaD 
respectively.  The  disposition  of  the  curves  W  and  ESQ  shows  the 
effect  of  the  involved  substitution  on  the  crossing  of  the  ion  : 
the  energy  of  interaction  of  Na+  with  the  bundle  is  considerably 
affected.  The  modification  occurs  not  only  in  the  lower  part 
(close  to  the  modified  residue)  but  all  the  way  through  the 
channel  :  the  energy  in  ESQ  is  less  negative,  thus  less  favorable, 
than  in  the  wild  type.  As  in  W  (11),  the  variations  of  the  total 
interaction  energy  of  Na+  with  ESQ  are  governed  by  the  variations 
of  the  electrostatic  component  of  this  energy,  by  far  less 
favorable  in  ESQ  than  in  W,  the  other  components  of  the 
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interaction  energy  being  not  affected  (Fig.  6  to  be  compared  with 
Fig.  4) . 

E 


(kcol/mol?) 


Fig.  6.  Energy  components  of  the  profile  in  the  mutant  ESQ  : 

-  total 

-  -  -  electrostatic  component 

.  polarization 

As  in  W  the  Lennard-Jones  component,  small  everywhere,  is  not 
given.  Scale  as  in  fig.  4. 

If  we  consider,  on  the  other  hand,  the  profile  obtained  after 
mutating  Glu  to  Asp  in  the  two  a  Mil's,  we  observe  that  it  is 
nearly  undistinguishable  from  the  profile  obtained  in  the  wild 
type. 

Thus  on  the  basis  of  the  energy  results,  the  transit  of  the 
ion  should  be  appreciably  less  favorable  in  the  ESQ  mutant  than  in 
the  wild  type  whereas  it  should  be  about  as  favorable  in  the  EaD 
mutant  as  in  the  wild  type  :  these  results  appear  in  remarkable 
agreement  with  the  observations  of  ref.  (10)  where  indeed  the  ESQ 
mutation  lowered  the  channel  conductance  by  about  one  half, 
whereas  the  EaD  mutation  had  very  little  effect. 

The  reasons  of  the  variations  observed  in  the  global  energy 
profiles  upon  mutating  towards  ESQ  can  be  still  better  understood 
upon  considering  the  trajectory  (Fig.  7)  followed  by  the  ion 
together  with  the  values  of  its  interaction  energies  with  each 
helix  individually  along  its  path  (Fig.  8)  .  Very  strikingly,  the 
successive  optimal  points  reached  by  the  ion  in  W  and  in  ESQ  are 
essentially  the  same  (whatever  the  starting  point  chosen  for  the 
optimization).  Moreover,  descending  towards  the  N-terminal,  these 
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Fig.  7.  The  optimal  points  reached  by  Na+  in  the  successive  planes 
studied. 

•  Upper  part  of  the  channel  down  to  the  level  of  the  Ca  of 
residues  16  (according  to  the  simplified  numbering  in  Fig.  1) . 

.  From  the  level  of  the  Ca  of  residues  15  down  to  the  level  of 
the  Ca  of  residues  11. 

x  Next  portion  of  the  channel  down  to  the  N-terminal  of  Mil's. 

-  Joins  the  traces  of  the  axes  of  the  Mil's  at  the  bottom  and 

the  top  of  the  cone. 

a  position  of  the  free  oxygen  of  Glu. 
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Fig.  8.  contribution  of  each  Mil  helix  to  the  energy  profiles  in  W 
and  in  ESQ  as  indicated  :  the  curves  a^,  a2,  P,  h  S  correspond  to 
the  interaction  energy  of  Na+  with  the  respective  helices  (order 
indicated  in  insert) . 


536 


points  rapidly  cluster  in  the  neighborhood  of  MII£  :  in  the  wild- 
type  channel,  the  cation  is  more  and  more  attracted,  all  along  its 
progression  from  top  to  bottom  of  the  channel,  towards  the  region 
located  near  MlljS  where  the  conjunction  of  the  attractive  effects 
of  the  four  glutamates  present  is  maximum.  The  presence  of  a  Gin 
instead  of  a  Glu  in  Mll-r  makes  the  region  surrounding  this  helix 
less  favorable.  The  individual  interaction  energies  of  Na+  with 
the  different  helices  (Fig.  8,W)  reflect  its  proximity  to  each  of 
them  respectively  :  note  the  most  favorable  interaction  with  p, 
the  favorable  but  smaller  (except  at  the  extremes)  interaction 
with  and  a2,  the  favorable  but  constantly  small  interaction 
with  S  and  finally,  the  unfavorable  interaction  with  7,  increasing 
towards  the  bottom  of  this  helix  where  no  negative  charge  is 
present  to  attract  the  ion. 

When  Glu  in  S  is  mutated  to  Gin,  the  convergence  of  the 
attraction  remains  in  the  neighbourhood  of  p  and  the  optimal 
positions  remain  essentially  the  same  with  interactions  with  p, 
a-jL,  <*2  similar  as  before.  But  now,  the  interactions  with  5  and  with 
T  are  both  unfavorable  (Fig.  8, ESQ) ,  more  so  towards  the  bottom  of 
the  helices  which  both  lack  a  negative  charge.  Hence  an  overall 
less  favorable  global  energy. 

When  Glu  is  mutated  into  Asp  in  the  two  a's  the  situation  is 
very  similar  to  that  of  the  wild  type,  both  in  the  path  of  the  ion 
and  in  the  individual  interactions  :  in  the  mutated  channel,  in 
spite  of  the  shorter  length  of  the  Asp  side  chains,  the  field 
which  they  create  on  the  ion  is  still  strong  and  the  convergence 
of  the  effects  of  aAsp,  /3Glu,  aAsp  is  clearly  similar  to  that  of 
aGlu,  /JGlu,  aGlu. 

CONCLUDING  REMARKS 

The  energy  profiles  calculated  for  the  mutants  E255Q  in  S  and 
E241D  in  a  of  the  AChR  channel  using  the  proposed  model  of  the 
channel  inner  wall  account  remarkably  well  for  the  variations  in 
the  observed  conductance  with  respect  to  that  of  the  wild-type. 
The  analysis  of  the  data  confirms  and  specifies  the  role  proposed 
(8)  for  the  negatively  charged  residues  at  the  N-terminal  of  the 
Mil  helices.  The  agreement  of  the  profiles  calculated  within  the 
model  with  the  conclusions  of  the  conductance  experiments  appears 
as  a  confirmation  of  the  essential  structural  hypotheses  of  this 
model.  Although  the  important  role  of  the  N-terminal  negative 
residues  seems  well  established  both  by  theory  (8,11)  and 
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experiment  (10),  it  is  possible  that  they  depart  somewhat  from  the 
bridge  structures  found  by  energy  optimization  in  vacuo  (14).  The 
conjunction  of  theoretical  calculations  and  experimental 
observations  on  other  mutants  (10,15)  should  help  a  further 
refinement  of  the  model. 

Until  a  crystal  structure  at  high  resolution  of  the 
transmembrane  part  of  the  receptor  becomes  available,  progress 
towards  the  understanding  of  its  functionning  relies  heavily  on 
further  model isat ion  in  close  contact  with  experiment.  We  hope 
that  the  present  model  can  serve  as  a  starting  point. 
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DISCUSSION 


WIPFF  -  The  interaction  energy  of  Na+  with  the  bundle  built  up  from  poly  Ala  neutral 
residues  is  quite  large  and  attractive  (about  -  50  kcal/mote)  but  is  only  about  half  of  the 
dehydration  energy  of  the  ion.  Where  would  you  suggest  these  additional  required 
50  kcal/mol  could  come  from  for  passing  through  the  Channel  in  water  ? 

A.  PULLMAN  -  The  calculation  of  the  profile  for  a  bare  Na+  in  the  package  of  five  poly 
Ala  was  done  in  order  to  explore  whether  a  bundle  of  entirely  hydrophobic  non  polar 
helices  could  accommodate  a  cation  in  spite  of  the  absence  of  polar  residues  inside 
the  pore,  the  common  belief  at  the  time  of  the  calculation  (BBA,  1986,  ref.  13  of 
manuscript)  being  that  polar  and  even  ionized  residues  were  necessary.  We  did  not 
pretend  to  touch  the  problem  of  entrance  or/and  desolvation. 

I  mentioned  the  role  of  polar  (non  ionized)  residues  when  present  (like  serines  which 
are  often  there  in  small  number).  They  provide  extra-binding  energy  or/and  can  help 
local  disturbance  of  water  structure  (J.  Mol.  Str.  and  Dynamics  1987,  4,  589-598). 
Entrance  desolvation  is  probably  progressive  and  helped  by  some  polar  (often  ionic) 
groups. 


PERAHIA  -  Your  model  does  not  contain  water  molecules  inside  the  channel  which 
may  have  important  screening  effects.  The  electrostatic  energy  variations  will  be  much 
lower  if  you  take  into  account  screening  effects,  and  the  electrostatic  energy  profile 
may  tend  to  be  different. 

A.  PULLMAN  -  We  have  not  yet  introduced  water  explicitely.  However  (particularly  for 
the  last  problem  considered)  we  have  done  the  same  calculations  with  different 
(tentative)  value  of  the  dielectric  constant :  although,  truly,  it  affects  the  values  of  the 
energies,  hence  the  profile,  by  screening  the  electrostatic  component,  it  did  not  affect 
the  overall  results  concerning  the  location  of  the  trajectory  of  Na+  and  the  reasons  for  it. 
It  must  be  added  that  various  experimental  evidences  indicate  that  only  a  few  water 
molecules  are  in  the  narrow  (low)  part  of  the  AChR  channel,  hence  a  relatively  small 
screening  only  can  occur  for  the  effect  of  the  four  glutamates  which  we  advocate. 
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MAROUN  -  In  the  energy  profile  of  the  passage  of  the  Na+  ion  from  the  top  to  the 
bottom  of  the  channel,  there  is  an  energy  barrier  of  about  15  Kcal  around  residue  58. 
How  can  the  ion,  then,  go  through  this  middle-of-the  channel  barrier  ? 

A.  PULLMAN  -  As  I  said  the  model  does  not  include  complete  optimization  (of  the  side 
chain  conformations  or  of  the  bundle  itself)  neither  without  nor  with  the  ion.  Such 
optimizations  can  very  easily  lower  or  abolish  small  barriers.  This  has  been  explicitely 
shown  in  our  early  calculations  on  the  model  bundles  with  Ala,  Leu,  Ser  and  Glu  (See 
our  BBA  papers  in  1986-1987-1988). 


BUCKINGHAM  -  It  is  thought  that  entropic  effects  play  an  important  role  in  hydrophobic 
interactions.  Do  you  think  they  are  important  in  transport  through  membranes  ? 

A.  PULLMAN  -  This  is  a  question  we  did  not  touch  for  the  moment. 
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ANALYSING  AND  MODELLING  THE  DEFORMATION  OF  PROTEIN  SECONDARY 
STRUCTURES 


C.  ETCHEBEST 

Laboratoire  de  Biochimie  Theorique,  Institut  de  Biologie  Physico- 
Chimique,  13,  rue  Pierre  et  Marie  Curie,  Paris  75005  (France) 


ABSTRACT 

A  new  method  able  to  perform  controlled  deformations  of 
oligopeptides  with  the  help  of  geometrical  constraints  is 
described.  The  deformation  energy  for  two  types  of  deformation 
applied  to  a-helices  with  three  different  amino  acid  sequences  has 
been  computed.  The  results  show  that  this  approach,  coupled  with 
the  P-Curves  method  for  analysing  polypeptide  structure,  is 
particularly  appropriate  for  producing  global  modifications  of 
conformation. 


INTRODUCTION 

Secondary  structures,  most  notably,  a-helices  and  0-sheets 
form  a  considerable  part  of  most  protein  conformations.  However, 
these  structural  elements  rarely  approach  perfect  helicoidal 
symmetry  and  their  participation  in  the  overall  conformation  often 
involves  considerable  deformations.  In  order  to  understand  the 
energetic  nature  of  these  effects  we  have  developed  a  technique 
that  is  able  to  produce  easily  controlled  deformations  of  basic 
structural  elements  and  enables  us  to  perform  theoretical 

calculations  of  deformation  energy.  In  order  to  study  the 
influence  of  the  peptide  sequence  and  to  clarify,  in  particular, 
the  role  of  proline,  we  have  investigated  three  different  a 
helicoidal  sequences  of  21  amino  acids,  (Ala)21,  (Val)21  and 

(Ala)10-pro-(Ala)10. 

METHOD 

In  order  to  apply  constraints  between  any  2  chosen  peptide 
units  i  and  j  we  firstly  set  up  an  axis  system  U,  V,  W  centered  at 

point  P  which  indicates  the  location  of  the  helical  axis  system  in 

the  starting  conformation  (Fig.  1) .  The  four  variables  (2 

translations  and  2  rotations)  necessary  to  fix  this  axis  system 
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have  already  been  described  in  our  previous  work  concerning  the  P- 
Curves  algorithm  (ref.  1-2). 


Fig.  1  :  Definition  of  the  axis  systems  for  peptide  i  and  j. 

& 

Theses  axis  systems  can  now  be  used  to  construct  various 
geometrical  constraints  : 

1)  Stretch  or  lateral  dislocation  between  peptide  i  and  peptide 
j  :  (P^-Pj) .X^,  where  is  or  "v^  or 

2)  Helical  axis  bend  between  peptide  i  and  peptide  j  :  U^.Uj. 

3)  Bending  Direction  :  Uj .V^/ | (Uj-(U^.Uj)U^)  j ,  the  sign  being 
defined  by  :  (UjXV^) ,U^ 

4)  Helical  twist  Angle  :  V^.Vj  (maintaining  and  Uj  colinear) 
the  sign  being  defined  by  :  (VjxVj )  .!Tj_ 

These  geometrical  constraints  are  imposed  during  the 
minimisation  process  of  the  conformational  energy  whose  the 
formula  is  given  by  five  terms  :  electrostatic,  short  range 
repulsion,  dispersion,  dihedral  angle  rotation  and  valence  angle 
distorsion  (see  ref.  3  for  details) .  For  this  purpose  analytic 
first  derivatives  of  the  the  active  constraints  are  calculated  and 
used  in  conjunction  with  the  energy  and  its  derivates  by  an 
advanced  variable  metric  constraint  minimisation  routine.  In  the 
case  of  the  polypeptide  containing  a  proline,  the  computations  are 
made  allowing  the  proline  ring  to  be  flexible,  assuring  the  ring 
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closure  again  with  the  help  of  a  geometrical  equality  constraint 
(ref.  4) . 

The  starting  conformations  for  (Ala)21  and  (Val)21  helices 
were  obtained  with  a  standard  set  of  <j>,  $  and  w  angles  (ref.  5) 
respectively  -51°,  -47°,  180°,  and  with  x  angles  equal  to  180°. 
For  the  proline  residue,  the  values  were  -74.2°,  -47.0°,  180.0° 
and  the  x  angles  for  the  ring  were  taken  as  the  most  statistically 
probable  conformation  (ref.  6) .  The  deformation  energy  is  given  by 
the  difference  between  the  optimal  conformational  energies  of  the 
constrained  structure  and  the  freely  minimised  reference 
structure . 

During  optimisation,  the  axis  systems  defined  above  are 
locked  to  their  respective  peptides.  Once  the  final  structure  is 
obtained  a  P-Curves  analysis  is  made  to  find  the  optimal  helical 
description  of  the  now  distorted  oligopeptide.  This  analysis  can 
lead  to  changes  in  the  desired  constraint  geometry  (e.g  bending 
angle)  but  these  changes  are,  in  general,  neglegible. 

Two  types  of  deformation  have  been  studied  for  the  three 
sequences  mentioned  above  :  stretch  and  bending  angle  (for  two 
directions) .  The  constraints  have  been  imposed  between  residues  2 
and  20  of  the  sequence. 

RESULTS 

Before  studying  the  effect  of  the  various  deformations,  we 
have  analysed  with  the  P-Curves  algorithm  the  optimised 
conformations  of  the  three  sequences.  The  results  show  that  the 
(Ala)21  and  (Val)21  are  linear,  whereas  the  sequence  containing  a 
proline  is  naturally  curved,  the  corresponding  bending  angle 
between  residues  2  and  20  being  17*. 

Let  us  now  consider  the  results  obtained  when  the  structures 
are  stretched.  In  fig  2  we  present  the  energy  variations  for  the 
sequences  (Ala)21,  (Val)21  (curve  A  and  B  respectively)  with 
respect  to  the  stretching.  In  both  cases,  the  deformation  energy 
increases  rapidly  until  roughly  40  Kcal/mol  for  4  A  elongation. 
This  energy  loss  is  due  to  the  progressive  disruption  of  the 
ensemble  of  hydrogen  bonds.  Nevertheless  it  may  be  noted  that  the 
breaking  is  smooth,  hydrogen  bonds  being  lengthened  and  becoming 
more  bent.  In  some  cases  this  is  compensated  by  the  formation  of 
new  hydrogen  bonds  of  the  type  1-3  (310  helix) ,  situated  at  the 
end  of  the  oligopeptides.  This  situation  is  more  marked  for 
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The  results  for  the  third  sequence  containing  a  proline 
residue  are  shown  in  fig.  3.  The  energy  variation  is  smaller  than 
that  for  (Ala)  21  or  (Val)21.  It  is  clear  in  consequence  that  the 
presence  of  a  proline  residue  facilitates  this  deformation.  The 
mechanism  involved  in  stretching  is  the  same  :  hydrogen  bonds  of 
1-3  type  are  formed,  the  location  of  these  hydrogen  bonds  is 
however  not  at  the  end  of  the  structure  but  rather  in  the 
immediate  vicinity  of  the  proline.  Moreover  we  have  observed  a 
modification  of  the  global  curvature  of  the  structure. 

Concerning  the  bending  angle  constraint,  we  have  studied  two 
perpendicular  directions  for  the  three  sequences.  Fig.  4 
illustrates  the  results  for  (Ala)21,  the  two  curves  representing 
the  two  directions  of  curvature.  The  differences  between  the  two 
curves  indicate  an  anisotropy  of  bending  amounting  to  almost  2 
Kcal/mol.  This  anisotropy  may  be  attributed  to  the  angular 
distribution  of  the  hydrogen  bonds  within  the  a-helical  structure 
of  the  oligopeptide. 


Fig.  4  :  Deformation  energy  with  respect  to  the  bending  angle  in 
two  perpendicular  directions  for  (Ala)21. 

The  form  of  both  curves  is  nevertheless  similar  and 
quadratic.  The  mean  force  constant  computed  is  1.25xl0~2 
Kcal/mol/A2  .  The  energy  loss  reaches  roughly  10  Kcal/mol  for  a 
bending  angle  of  30-35*.  Bending  occurs  through  the  progressive 
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deformation  of  the  hydrogen  bonds,  but,  in  contrast  to  the 
stretching  deformations,  the  number  and  the  type  of  these  bonds 
are  conserved. 

The  results  for  (Val)21  are  shown  in  fig.  5  and  are  globally 
similar  to  those  for  (Ala)21  (mean  force  constant  for  bending  = 
l.lxlO-2  Kcal/mol/A2) .  It  can  thus  be  remarked  that  the  larger 
valine  side  chains  appear  to  play  a  smaller  role  in  affecting 
bending  than  was  the  case  for  stretching. 


AE  (Kcal/mole) 


0.0  3.0  10.0  15.0  20. 0  25.0  30.0  35.0  40.0 


Fig.  5  :  Deformation  energy  with  respect  to  the  bending  angle  in 
two  perpendicular  directions  for  (Val)21. 

The  final  results  concerning  the  sequence  (Ala) 10-Pro-(Ala) 10 
are  given  in  fig.  6.  As  a  first  remark  it  may  be  noted  that  the 
deformation  energy  is  only  4-7  Kcal/mol  for  35'  of  bending,  the 
value  depending  on  the  direction.  These  values  are  much  smaller 
than  those  obtained  with  the  other  sequences.  This  result 
indicates  that  the  presence  of  the  proline  facilitates 
a-helix  bending.  Nevertheless,  as  the  results  show  its  bending  is 
anisotropic.  A  second  difference  resides  in  the  shape  of  the 
bending  curves  :  the  deformation  energy  in  the  two  directions  is 
not  quadratic.  Despite  these  differences,  the  mechanisms  involved 
are  the  same  described  previously  and  consist  of  the  progressive 
deformation  of  the  hydrogen  bonds. 
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The  effect  of  the  geometrical  bending  constraint  can  be  more 
easily  visualised  with  the  help  of  Fig.  7.  In  this  figure,  we  show 
the  molecular  diagrams  of  the  optimised  structures  computed  for 
three  different  bending  angles  for  the  sequence  (Ala)10-Pro- 
(Ala)1Q.  The  hydrogen  bonds  are  indicated  by  dotted  lines.  The 
corresponding  helical  axis  and  the  peptide  backbone  (represented 
with  a  ribbon),  computed  with  the  P-Curves  method,  are  indicated 
on  the  right  part  of  the  figure. 


CONCLUSIONS 

The  results  obtained  in  this  study  have  brought  to  light 
several  properties  of  the  a-helix  : 

1)  in  all  cases  the  energy  increases  rapidly  with  stretching. 

2)  for  (Val)21  and  (Ala)21,  the  variation  of  energy  with 
respect  to  bending  is  quadratic. 

3)  When  a  proline  is  included  in  the  helix,  bending  becomes 
much  easier,  but  shows  a  marked  anisotropy. 

This  simple  study  demonstrates  the  use  of  appropriately 
defined  geometrical  constraints  within  an  internal  coordinate 
formalism.  Future  applications  of  this  technique  will  allow  us  to 
define  sequence  dependant  flexibilities  for  polypeptide  structural 
motifs  which  can  subsequently  serve  for  the  construction  of 
simplified,  large  scale  models  of  protein  architecture. 
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MONTE  CARLO  FREE  ENERGY  CALCULATIONS  IN  CONFORMATIONAL  STATISTICS  OF  POLYPEPTIDE 
CHAINS. 
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SUMMARY 

The  partition  of  the  total  conformational  energy  of  a  chain  molecule  into 
short-range  and  long-range  interactions  allowed  us  to  propose  Monte  Carlo  proce- 
duresbased  on  the  simultaneous  use  of  statistical  weight  matrices  for  short- 
range  terms  and  importance  sampling  for  the  long-range  interactions.  Such  an 
approach  gives  also  a  way  of  defining  reference  states  of  the  molecular  system. 
Therefore,  it  becomes  possible  to  study  the  behavior  of  chain  molecules  under 
more  complex  conditions .  The  free  energy  and  entropy  variations  due  to  long- 
range  interactions  are  then  calculated.  Application  of  the  present  calculation 
methods  are  made  on  molecular  models  of  peptidic  hormones. 


INTRODUCTION 

Excluded  volume,  solvent  effects  as  well  as  electrostatic  interactions  are 
mainly  responsible  for  the  folding  of  polypeptide  chains.  However  these  long- 
range  interactions  introduce  major  difficulties  (refs.  1,2)  because  they  prevent 
us  from  expressing  the  conformational  energy  as  a  sum  of  terms  associated  to 
each  chain  unit  or  to  first  neighbor  pairs.  Moreover,  beside  the  commonly  eva¬ 
luated  energy  of  configuration,  the  stability  of  macromolecules  is  also  depen¬ 
dent  on  entropic  contributions  or,  more  generally,  on  the  free  energy  of  the 
system.  But,  there  is  generally  no  direct  access  to  such  quantities  especially 
if  the  molecular  system  is  subject  to  solvent  effects. 

The  partition  of  the  total  configurational  energy  of  a  chain  molecule  into  a 
part  associated  to  short-range  interactions  and  another  one  to  the  long-range 
interaction  energy  permits  (ref.  3)  to  define  reference  systems  for  the  chain 
molecules  since  when  only  short-range  interactions  are  considered,  all  the  sta¬ 
tistical  properties  of  the  system  are  calculable.  These  reference  systems  are 
then  used  (ref.  4)  for  the  calculation  of  entropy  and  free  energy  variations  due 
to  the  effects  of  long-range  interactions  on  the  molecular  chain  conformation. 
Models  of  polypeptide  chains  and  particularly  of  peptidic  hormones  are  taken  as 
examples  of  application  of  the  proposed  methods  of  calculation. 


METHODS 

Conformational  energy 


Atomic  coordinates  of  polypeptide  chain  conformations  are  computed  using 
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standard  bond  lengths  and  bond  angles  (ref.  5).  The  different  residues  of  the 
peptide  chain  are  classified,  as  described  previously  (ref.  3),  into  five  types: 
GLY,  PRO,  ALA  (for  araino-acids  with  a  side  chain),  a  N-terminal  unit  and  a  C- 
terminal  one.  Every  residue  in  a  polypeptide  chain  has  a  definite  type  which 
corresponds  to  a  precise  energy  map.  The  different  maps  are  divised  into  regions 
or  unit  states. 

The  conformational  energy  of  a  polypeptide  chain  of  n  units  is  d. fined  as  the 
sum  of  all  pairwise  nonbonded  (6-12  potentials)  and  electrostatic  interactions 
between  atoms  plus  torsional  energy.  Parameters  and  expressions  of  energy  con¬ 
tributions  presently  used  are  already  described  elsewhere  (ref.  5). 

Let  us  thus  consider  the  following  expression  of  the  total  energy  of  a  poly¬ 
peptide  chain  in  configuration  r)  : 


E*1  =X  E?  +  J!T  Ey,t:  .  +  =  E1^ 

T  i=l  1  i=2  i_1,i  L  S 


(1) 


t  ,  ,  , 

E^  corresponds  to  the  contributions  to  the  configurational  energy  from  inter¬ 
actions  between  atom  pairs  within  a  unit  (i)  at  the  point  t  of  its  (<p,tj>)  map. 

IT  t  • 

Terms  E.  ’ .  .  are  related  to  interactions  between  atoms  located  in  two  nearest- 
neighbor  units  in  conformations  defined  by  the  points  r  and  t  of  their  respecti¬ 
ve  maps.  The  two  summations  constitute  the  short-range  interactions  energy  Eg 
whereas  E^  represents  the  energy  of  long-range  interactions  between  atom  pairs 
situated  in  units  more  distant  along  the  chain  than  first  neighbors. 


Chain  with  first  neighbor  interacting  units 

For  molecular  chains  subject  only  to  short-range  interactions,  the  configura¬ 
tional  partition  function  can  be  written  as  (with  the  Boltzmann  constant  kg  and 
temperature  T)  : 


ZQ  =X  X  exp(-B  Ec>  5  B  -  l/kj 

S  (C)  neC  S 


(2) 


(C)  is  the  set  of  chain  conformations  defined  by  all  the  different  sequences  of 
map  regions  which  can  be  given  to  the  chain  units  whereas  the  subset  C  is  defi¬ 
ned  by  giving  a  precise  map  region  to  every  chain  unit.  The  configuration  r| 
corresponds  to  a  sequence  of  (<p,t{i)  points  taken  in  these  precise  regions.  If 
there  are  n^  points  (cp, ip)  in  the  region  H  given  to  unit  (i)  and  n^  in  region  k 
allocated  to  (i— 1 ) ,  we  have  the  following  averages  : 


\imi  X^pw5  EJ>  and  i  =  ~  X  Xcxp(~e  E-:f  j> 

1  11,1  'VkrekKi  11,1 


(3) 
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r. 


By  substitution  of  these  averages  taken  over  map  regions  for  the  corresponding 
terms  in  exp(-$  Eg),  we  can  replace  this  last  expression  by  : 


(A) 


exp (-13  Eg)  =IT  ijjwflf.i 
i=l  i=2 

n 

Then,  with  the  degeneracy  G  .-n  n.  .  of  the  subset  C  of  conformations, 

c  i=J  X.,1 

Zc  =  Z  G  exp(-g  Eg)  i> 

S  (C)  C  S  (C)  71,1  M  i=2  *’1  ’ 

*  A 

This  relation  can  be  written  as  :  Zg  =  U  (  Jjupj 

i=2 

k  £  *  r  i 

with  the  matrix  Ui  =  [i^  =  [n^  a %  i  W^j  -_3 ,  the  row  vector  U  =  Lu^jJ  = 

[n^  j]  for  i  =  1  and  the  unit  column  vector  J. 

Monte  Carlo  samples  of  molecular  conformations  are  obtained  using  these  sta¬ 
tistical  weight  matrices  (ref.  3).  The  a  priori  probability  that  the  unit  (i)  is 
in  the  map  region  or  state  £  can  be  computed  using  the  relation  (ref.  1)  : 


(5) 


(6) 


,i  ■  ^ 1  »*  <  nv  »U 


i-1 


(7) 


where  the  matrix  u'  .  is  deduced  from  U.  by  replacing  all  elements  by  zero 

X  j  1  ^ 

except  those  of  column  £. 

For  a  chain  molecule  of  n  units  subject  only  to  nearest  neighbor  interac¬ 
tions,  the  energy  of  a  conformation  taken  in  the  subsetC  canbe  written  as 
(ref.  4)  : 


i=2 


k,£ 

i 


with  E^  =  -6~ ‘  In  ah>,  and  eM  =  -  6_1  In  (a^.  wj:^.) 


The  average  of  Ec  over  the  complete  set  of  chain  configurations  is  thus 

b 

given  by  : 

n 

<ES>  =  <E,  >  +Z  <£•> 
i=2 


(8) 


i-1 


with  <e.>  =  z'1  u  ([J  uh)  u!  (  JJuh)j 
h=2  h=i+l 


(9) 


i  ■  . 
&  - 
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_r  *+ 

and  <Ej>  =  Zg‘  U  (  JJ  uh>J  (10 

h=2 

+  k  2 

where  the  element  (k,Z)  of  the  matrix  U.  is  equal  to  u.  »  .  and  the  ele- 

l  l 

ment  h  of  the  row  vector  U  is  E ^  j . 

The  configurational  free  energy  F_  and  entropy  S„  of  the  chain  molecule  can 

b  b 

then  be  obtained  from  the  following  relationships  : 


Fs  ■  - 6  hV<V-H 


Chain  models  with  short  are  long-range  interactions 

The  total  conformation  il  energy  E^,  of  a  chain  molecule  is  the  addition  of 
short-range  and  long-ranfe  interaction  energies.  The  partition  function  of  the 
chain  is  then  given  by  : 

Z  =  £  exp(-B{Eg  +  E^})  while  Zg  =  £  exp(-$  Eg)  (12) 

h  h 

The  summation  is  carried  out  over  all  the  states  h  of  the  molecular  system.  The 
statistical  average  of  a  function  f  of  the  conformation  should,  in  principle,  be 
calculated  with  the  relation  : 


<f>  =  E  fh  eXP(“g  ET>  '  ZT 
h 


But,  such  quantities  can  only  be  evaluated  by  molecular  dynamics  methods  or 
Monte  Carlo  sampling.  In  this  last  case,  the  calculation  procedure  we  propose 
(ref.  3)  consists  of  choosing  chain  conformations  proportionally  to  statistical 
weights  obtained  from  a  chain  model  including  only  short-range  interactions  and 
then  to  keep  or  reject  these  conformations  according  to  an  importance  sampling 
using  the  compl -nentary  part  of  the  total  energy  coming  exclusively  from  long- 
range  interactions. 

Note  that  chain  models  with  only  first  neighbor  interdependent  units  can  be 
taken  as  references  for  the  evaluation  of  the  effects  of  long-range  energy  on 
chain  configurations.  Therefore,  from  relations  (12)  we  get  : 


ZT  /Zg  =  <exp(-B  El)>s  (14) 

Angle  brackets  indicate  an  average  over  the  chain  configurations  of  the  refe¬ 
rence  system  (energy  Eg).  The  free  energy  difference  between  the  system 
(Eg  +  E^)  and  the  reference  system  is  thus  given  by  : 


i 

I 
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AF  =  -  8  1  In  <exp(-$  EL)>g  (15) 

To  improve  the  computation  method  we  used  a  one-half  umbrella  sampling  algo¬ 
rithm  (refs.  6,7)  for  Monte  Carlo  calculations  performed  as  described  previously 
(ref.  3).  Variations  of  the  chain  configurational  entropy  due  to  the  introduc¬ 
tion  of  E^  are  estimated  by  using  the  average  value  of  E^  obtained  from  an 
ordinary  Monte  Carlo  sampling  (ref.  8).  Moreover,  a  series  of  N  intermediate 
systems  can  be  introduced  (ref.  4)  so  that  successive  systems  do  present  a  good 
overlap  of  their  configurational  spaces  ;  we  can  then  write  : 

AF  =  <El>-TAS  --if1  I  m  <exp(-6  EL/N)>s+(j/N)L  (16) 

j=0 

with  <exp(-8  EL/N)>s+(j/N)L  =  Q(j+1  )/QCj)  (17) 

and  Q(j)  =  £  exp(-|3{Eg  +  (j/N)E^>) 
h 

Averages  corresponding  to  the  different  values  of  j  are  computed  using  a  Monte 
Carlo  procedure  (ref.  3)  in  which  Eg  intervenes  by  means  of  matrices  U  and  the 
fractions  of  E^  by  Boltzmann  factors.  According  to  relation  (13),  <EL>  is  obtai¬ 
ned  using  this  procedure. 

RESULTS  AND  DISCUSSION 

Calculation  methods  presented  above  were  applied  (with  T  =  300°K)  on  simpli¬ 
fied  models  of  two  peptidic  hormones  (refs.  9,  10)  ;  enkephalin  :  TYR-GLY-GLY- 
PHE-LEU  (or  MET)  and  8-casomorphin  :  TYR-PRO-PHE-PRO-GLY .  Two  different  Monte 
Carlo  calculations  were  performed.  One  of  these  procedures  is  based  on  the  use 
of  matrices  U  according  to  the  method  already  described  previously  (ref.  3). 

The  other  Monte  Carlo  sampling  method  used  is  an  ordinary  one  in  which  short- 
range  interaction  energy  is  calculated  at  each  step  of  the  procedure.  The  cor¬ 
responding  results  thus  obtained  are  indicated  in  Table  1. 

The  relative  number  of  times  a  map  region  is  affected  to  a  unit  of  the  chain 
gives  the  probability  of  occupancy  of  this  region  by  the  unit.  Results  obtained 
in  this  way  can  be  compared  to  values  calculated  according  to  relation  (7).  We 
thus  found  that  the  study  of  region  occupancies  is  a  fine  level  of  conforma¬ 
tional  analysis  as  it  shows  that  the  position  of  the  relative  minima  in  which 
the  chains  spend  much  time  are  actually  different  (ref.  3). 

The  determination  of  values  of  Zg  permits  to  get  an  estimation  of  Fg  and, 
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TABLE  1 

Values  (Kcal/mole)  of  the  free  energy,  energy  and  entropy  terms. 


D 

AP 

<EL> 

TAS 

3.5 

-30.2 

-39.0 

-8.8 

17.0 

-  6.1 

-  9.9 

-3.8 

ENKEPHALIN 

35.0 

-  3.5 

-  6.4 

-2.9 

80.0 

-  2.4 

-  4.4 

-2.0 

B-CASONDRPHIN 

3.5 

-20.0 

-25.6 

-5.6 

REFERENCE  CHAIN  NDDELS 

ENKEPHALIN 

Fs  : 

-20.9  ;  <Eg> 

:  -10.0  (U) 

;  TSs 

-  9.6  (M) 

B-CASONDRPHIN 

Fs  : 

-26.9  ;  <E_> 

b 

:  -19.6  (U) 

;  TSS 

-18.7  (NO 


D  :  dielectric  constant. 

U  :  calculations  with  matrices  U. 
M  :  ordinary  Monte  Carlo. 


with  <Eg>  already  known,  the  corresponding  value  of  the  entropic  term  TSg  can 
be  calculated. 

Results  of  calculations  show  clearly  that  long-range  interactions  do  repre¬ 
sent  the  major  contributions  to  the  conformational  energy.  Their  effect  on  the 
chain  configuration  is  mainly  to  lead  zwitterionic  molecules  toward  folded 
forms.  Indeed,  we  have  observed  that  when  interactions  between  the  charged  ends 
of  molecules  are  screened,  the  probabilities  for  a  unit  to  be  in  a  map  region, 
calculated  from  Monte  Carlo  sampling  with  a  completely  interacting  chain  model, 
are  very  much  similar  to  values  obtained  by  using  chain  models  with  nearest 
neighbor  interdependent  units. 

The  importance  of  E^  being  essentially  due  to  electrostatic  interactions, 
sampling  calculations  were  performed  with  different  values  given  to  the  dielec¬ 
tric  constant  D.  Results  thus  obtained  are  presented  in  Table  1.  The  different 
values  obtained  for  TAS, depending  on  D,show  that  the  loss  of  flexibility  of 
chain  molecules,  folded  by  long-range  interactions,  can  be  well  estimated  by 
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Che  variations  of  the  configurational  free  energy  and  entropy. 

Note  that  the  definition  of  (and  even  Eg)  may  include  a  potential  of  mean 
force  in  order  to  simulate  in  a  realistic  way  solvent  effects  on  molecular  con¬ 
formation.  Moreover,  the  calculation  methods  presented  are  completely  general 
and  can  be  applied  on  very  different  kinds  of  chain  models. 
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DISCUSSION 


WIPFF -  I.What  size  of  chains  are  you  focusing  on  ?  From  the  examples 
shown,  it  seems  to  deal  mainly  with  short  polypeptides. 

2. 1  am  somewhat  puzzled  about  the  relevance  of  these  simulation  for 
conformational  free  energies  in  solution.  How  would  you  treat  the  effective  interaction 
between  charged  groups  and  the  hydrophobic  effects  on  folding  ? 

PREM1LAT  -  1 .  Now,  I  am  just  using  short  polypepdide  chains,  mainly  in  order  to 

test  the  calculation  methods.  But  all  the  algorithms  can  be  applied  on  longer  chains 
(not  too  long  because  M.C.  is  computer  time  consuming).  The  number  of  regions 
defined  in  the  (<j>,  y)  maps  is  also  an  important  factor  because  it  defines  the  order  of  tne 
matrices  of  averaged  statistical  weights. 

2. 1  think  that  in  these  kind  of  simulations,  the  solvent  effect  must  be 
taken  into  account  by  some  potential  of  mean  force  rather  than  by  a  molecular  model 
of  water. 


PETTITT  -  Is  there  a  self  consistency  or  feed  back  between  the  short  range  correlations 
and  the  long  range  correlations  ;  that  is,  do  they  modify  each  other  ? 

PREMILAT  -  The  energy  of  the  molecular  chain  is  exprimed  by  an  addition  of  terms,  so 
if  any  correlation  between  short-range  and  long-range  interactions  exists,  it  can  be 
analyzed  with  the  M.C.  samples  of  conformations.  One  can  determine  the  probabilities 
of  occupancy  of  the  different  map  regions  by  every  chain  unit  and  then  see  if  they  are 
modified  when  long-range  interactions  are  included  in  calculations.  This  will  give 
some  information  on  the  relative  importances  of  short  and  long-range  energies  and  on 
the  correlations  between  them. 
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SUMMARY 

This  paper  is  devoted  to  the  electrostatic  potential  characteristics  of  a  mo¬ 
del  enzyme  (a-chymotrypsin)  active  site  and  to  fast  derivations,  within  an  appro¬ 
ximate  quantum  chemistry  scheme,  of  the  electrostatic  interaction  energy  as  an 
efficient  tool  to  predict  stable  conformations  of  the  partners  in  an  enzymatic 
reaction. 


INTRODUCTION 

Life  related  sciences  have  always  been  very  appealing  investigation  domain  for 
theoretical  study  but  were  very  difficult  to  investigate  with  sufficient  reliabi- 
;  lity  due  to  the  dimensions  of  the  systems  involved.  Nowadays,  some  of  these  pro- 

|  blems  are  becoming  easier  to  handle  because  of  the  development  of  computer  tech¬ 

nology,  particularly  in  one  of  the  most  crucial  challenge  proposed  by  the  study 
of  the  interaction  between  two  partners  as  an  enzyme  model  active  site  and  its 
j  ligand. 

j  A  very  physically  appealing  way  to  study  the  interaction  between  several  part- 

j  ners  is  to  split  the  interaction  energy  into  well-defined  terms  as  in  the  Moro- 

kuma's  decomposition  scheme  (ref.  1).  That  the  electrostatic  forces  are  most 
often  the  leading  strength  which  determines  the  relative  orientation  of  two  or 
more  partners  of  a  complex  is  already  well-known  (ref.  2)  and  in  this  respect, 
the  electrostatic  potential  (ref.  3)  has  been  interpretated  as  a  reactivity  in¬ 
dex  (ref.  4). 

In  this  work,  we  wanted  to  emphasize  the  characteristics  and  the  usefulness 
of  two  electrostatic  quantities  or  properties  :  the  electrostatic  potential  on 
one  hand  and  the  electrostatic  interaction  energy  on  the  other. 

THE  ELECTROSTATIC  POTENTIAL 

Given  a  charge  distribution  p  in  a  medium  characterized  by  a  polarizability 
?  and  a  dielectric  constant  e,  it  results  an  electrostatic  field  E  and  an  elec¬ 
trostatic  potential  V  altogether  related  by  : 

V*E  =  -  V*(VV)  =  4  a  (p  -  V*P)  =  4  a  £  (1) 

£ 


-V-St 
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The  electrostatic  potential  V  (E.P.)  is  more  widely  used  than  E  as  this  scalar 
quantity  represents  the  interaction  energy  between  a  molecule  with  the  charge 
distribution  p  and  a  bare  unitary  positive  point  charge.  Thus,  E.P.  reveals  in  a 
straightforward  way  the  nucleophilic  regions  of  the  system  and  their  relative 
magnitude.  As  such,  it  has  been  considered  as  a  reactivity  index  of  a  molecule 
(ref.  4). 

In  the  quantum  chemistry  framework,  p  is  expressed  in  terms  of  molecular  orbi¬ 
tals  obtained  in  a  self-consistent  way,  i.e. ,  which  already  take  into  account 
such  effects  as  the  polarization  of  each  charged  particle  on  its  surrounding 
partners,  as  well  as  exchange  repulsion  of  the  electrons,  charge  transfer,  and 
so  on.  Hence,  the  derivation  of  V  from  a  quantal  p  must  not  involve  any  other 
polarizability  and  in  eqn.  (1),  ?*P  is  set  to  zero  and  e  to  1 .  Then,  V  is  writ¬ 
ten  : 


V(r)  =  £  ID 
p  v  pv 


/  dr' 


X  (r') 

P 


-  r 


xv(r’> 


+  I  Z  /  i  R  - 
a  a  a 


(2) 


where  D  are  the  density  matrix  elements  in  the  basis  of  the  y  (t1 )  atomic  or- 
pv  p 

bitals,  Za  is  the  nuclear  charge  of  a. 

The  adequacy  of  the  use  of  a  dielectric  function  as  a  screening  function  in 
other  contexts  is  discussed  in  ref.  5.  In  this  work,  all  the  calculations  have 
been  carried  out  within  the  consistent  quantum  chemistry  framework,  according  to 
eqn.  (2). 


Polarization  corrective  term 

Nevertheless,  there  exists  another  possible  correction  not  to  be  confused 
with  this  previous  dielectric  effect  :  it  is  the  polarization  response  of  the 
whole  molecule  coming  from  the  charge  distribution  of  the  second  partner  it  in¬ 
teracts  with.  In  the  case  of  a  bare  proton,  the  correction  Vp^  to  V  can  be  eva¬ 
luated  by  the  perturbation  theory  method  and  can  be  approximated,  in  a  SCF 
treatment  by  (ref.  6)  : 


VPL(r) 


NO  NV 
'  Z  Z 
i  a 


(ci  '  Ca) 


[Z  Z  c  .  c  f  dr* 
\i  v  \ii  va 


Xp(r* > 


r  -  r' 


xv<r’>r 


(3) 


where  NO,  NV  are  the  number  of  occupied  and  virtual  M.O.  respectively,  and 
c  .  are  the  M.O.  eigenvalues  and  eigenvectors.  This  correction  Vp^  may  be  of 
some  importance  in  specific  problems  like  the  determination  of  proton  affinities 
(ref.  7).  It  is  highly  sensitive  to  the  inclusion  of  polarization  atomic  orbi¬ 
tals  (ref.  8),  particularly  with  small  basis  sets. 

However,  the  effect  of  the  polarization  given  by  a  unitary  point  charge  is 
effectively  much  higher  than  that  generated  by  a  real  system.  Hence,  this  cor¬ 
rection  is  not  expected  to  lead  to  inversion  of  the  electrostatic  behavior  of 
neutral  interacting  species. 
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Calculation  of  the  electrostatic  potential  at  a-chymocrypsin  active  site 

As  example  of  the  E.P.  usefulness  in  enzyme  ligand  interaction  study,  several 
active  site  models  of  a-chymotrypsin  have  been  investigated.  The  potential  fea¬ 
tures  were  apprehended  via  3D  maps  at  -  10  kcal/mole  (ref.  5).  Two  distinct  ways 
of  solving  eqn.  (2)  have  been  considered,  the  first  without  approximation  but 
using  D  obtained  after  deorthogonalization  of  CNDO/2  M.O.,  the  second  with  the 
approximation  hereafter  called  the  y  approximation  (ref.  9)  : 

V(r)  =  -  l  q  y  „  +  l  Z  /|r  -  R  |  (4) 

a  xx  aH  a  a  1  a 1 

where  q^  are  the  charges  obtained  after  a  Mulliken  population  analysis,  and  y  ^ 
are  the  bielectronic  repulsion  CNDO/2  (ref.  10)  integrals  between  s  orbitals  on 
a  and  on  H+  at  point  r .  The  contours  at  -  10  kcal/mole  obtained  at  the  two  le¬ 
vels  of  V  in  the  case  of  the  smaller  model  active  site  of  a-chymotrypsin  (91 
atoms),  are  very  similar  and  thus,  the  potential  maps  were  calculated  within  the 
much  faster  approximation  for  the  bigger  active  site  models  (207  and  216  atoms). 

One  of  the  most  striking  features  of  the  potential  is  the  clearcut  cooperati- 
vity  of  the  carbonyl  groups  of  the  protein  backbone  to  design  an  impressive  nega¬ 
tive  region  on  one  side  of  the  molecule  by  merging  their  respective  potential 
wells.  As  a  counterpart  to  these  favourable  folded  conformations  generating  such 
a  large  negative  potential  cloud,  no  negative  well  at  this  level  is  drawn  around 
B  strand  substructures  (Fig.  I). 


0 

Ser  214 


Fig.  1.  E.P.  contour  map  at  -  10  kcal/mole  for  a  model  active  site  of  a-chymo¬ 
trypsin  (216  atoms).  The  subregions  1  to  8  are  discussed  in  ref.  5. 
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This  complementarity  can  be  related  to  the  high  dipole  moment  value  observed  in 
each  model  and  emphasizes  the  directionality  of  the  driving  force  involved  in 
the  formation  of  a  Michael is  complex. 

The  electrostatic  interactions  are  the  leading  strength  of  the  first  step  of 
the  enzymic  reaction  and  thus  the  electrostatic  potential  constitutes  a  very  use¬ 
ful  qualitative  tool  in  the  understanding  of  the  beginning  of  the  process.  How¬ 
ever,  in  order  to  go  further  and  quantify  the  electrostatic  interaction  between 
the  two  partners,  their  electrostatic  interaction  energy  has  to  be  calculated. 


THE  ELECTROSTATIC  INTERACTION  ENERGY 

In  the  case  of  molecular  systems,  the  electrostatic  interaction  energy  writes 
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where  p  is  the  electronic  density  function  of  partner  A  or  B.  In  order  to  ob¬ 
tain  numerically  good  results  (ref.  11),  the  four  terms  of  eqn.  (5)  must  be 
grouped  : 


eS£  ■  < dr  vr)  »B1(r)  *  i  zs  vv 


(6) 


where  VA(r)  is  the  electrostatic  potential  of  partner  A.  It  is  obvious  that  the 
second  term  of  eqn.  (6)  is  very  easily  calculated  once  V A  is  known.  The  first 
term  can  be  determined  either  analytically  or  by  a  3D  numerical  integration.  The 
analytic  electrostatic  energy  is  computed  at  the  ST0-3G  W1S  level  (without  the 
1  s  orbitals  on  the  heavy  atoms)  by  our  modified  link  604  of  GAUSS82  implemented 
on  a  Data  General  MV7800  computer  (0.8  Mips). 


A.  Numerical  derivation 

With  a  well-conditioned  stepsize  of  0.25  &  of  the  3D  grid,  the  calculation  of 
ES 

one  value  of  E^nt  lasts  25  s  compared  to  103  s  for  an  analytical  ST0-3G  W1S  deri¬ 
vation  for  the  complex  between  the  dyad  water-imidazole  (partner  A)  and  the  me¬ 
thanol  (partner  B)  O'ig.  2).  Hence,  the  numerical  procedure  constitutes  an  appre¬ 
ciable  gain  of  computer  time  as  the  determination  of  the  most  stable  conform¬ 
ations  requires  many  calculations. 

•  •  ■  el 

The  question  remains  as  to  which  level  of  approximation  and  are  deter¬ 
mined  with.  Two  levels  are  presented  here. 


Fig.  2.  Definition  of  the  rotational  coordinates  for  two  configurations  (a,B)  of 
the  complex  methanol  +  the  dyad  water-imidazole. 

Configuration  a  Configuration  /J 


f  Fig.  3.  Electrostatic  interaction  energy  between  the  methanol  and  the  dyad  water- 
f;  imidazole  for  the  two  configurations  (cc,B).  Full  line  :  analytical  STO- 

3G  W1S  result.  Dashed  line  :  numerical  result  at  level  (a).  Dash-dotted  ; 

l: :  line  :  numerical  result  at  level  (b).  i. 
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Level  (a)  :  the  grids  are  determined  with  deorthogonalized  CNDO/2  density  matrix 

elements  D  and  eqn.  (2)  is  used  for  calculating  V.. 

pv  ei  A 

Level  (b)  :  p*  is  obtained  at  level  (a)  but  V  is  calculated  according  eqn.  (4), 
D  A 

within  the  y  approximation. 

The  rotational  coordinates  are  defined  in  Fig.  2.  As  it  can  be  seen  in  the 
particular  case  of  the  coordinate  w  ,  the  two  levels  provide  very  good  results 
compared  to  the  analytical  reference  (Fig.  3). 

As  a  conclusion,  the  numerical  procedure  runs  4  times  faster  than  the  analyti¬ 
cal  one  for  the  studied  complex  and  gives  very  satisfactory  results  concerning 
relative  orientation  of  the  two  partners  at  a  given  intersystem  distance.  Thus, 
it  constitutes  a  powerful  tool  for  predicting  stable  conformations  of  complexes 
in  a  region  where  the  electrostatic  approximation  is  valid,  at  internuclear  dis¬ 
tances  between  the  partners  greater  than  2.5  A. 


B.  Approximate  analytical  derivation 

In  the  same  way,  we  wanted  to  study  whether  it  was  possible  to  obtain  good 
results  with  an  analytical  procedure.  Furthermore,  in  order  to  remain  within 
reasonable  CPU  times  to  deal  with  big  systems,  only  approximate  methods  were  to 
be  considered.  It  was  chosen  to  work  within  the  CNDO/2  framework  (ref.  10),  in 
which  the  first  three  terms  of  eqn.  (5)  write  : 


E?S„  .  =  E  E  y  a  (  E  E  Da  D8  ) 
mt,  1  at  A  6eB  aB  pea  veB  UP  vv 


(7) 
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(9) 


The  terms  (  E  D°  )  can  be  considered  as  the  electronic  charges  localized  on  the 
nuclei  a  belonging  to  partner  A.  They  can  be  evaluated  either  at  CNDO/2  level  or 
deri/cd  from  the  Muiliken  population  analysis  applied  on  the  MO  coefficients 
(hereafter  called  CNDO/D  approximation).  A  comparison  between  the  two  kinds  of 
de  'vat ion  has  been  analyzed  in  the  study  of  the  complex  dyad  water-imidazole 
and  the  methanol.  Though  the  similar  curves  thus  obtained  vary  much  more  smooth¬ 
ly  and  present  lower  absolute  values  than  the  analytical  ST0-3G  MIS  ones,  they 
behave  quite  correctly  and  the  largest  difference  in  the  position  of  the  minima 
is  of  only  10°  for  the  considered  rotational  coordinates.  For  every  curve,  the 
CNDO/D  results  were  better  than  the  simple  CNDO/2  ones  and  in  the  following,  all 
the  calculations  were  done  at  the  CNDO/D  level. 

At  this  point  of  the  discussion,  the  adequacy  of  using  an  approximately  cal- 

ES 

culated  (numerically  or  analytically)  E?nC  to  find  out  the  most  stable  relative 
orientation  of  two  partners  has  been  investigated.  The  influence  of  their 


563 


intemuclear  distances  was  negligible  because  none  were  lower  than  2.5  K.  What 
goes  on  when,  during  a  rotation  or  during  a  direct  approach,  one  or  more  inter¬ 
system  distance  gets  too  short  ?  As  can  be  seen  from  Fig.  4,.  the  electrostatic 


Fig.  4.  Electrostatic  E.  and  total  E.  interaction  energy  between  the  metha- 
°  int  mt 

nol  and  the  dyad  water-imidazole  in  the  configuration  S  .  Full  line  :  E. 

ES  lnC 

at  the  3-21G  level.  Dashed  line  :  E.  at  the  3-21G  level.  Dotted  line  : 

ES 

E7  at  the  CNDO/D  level, 
mt 

energy  derived  with  the  3-21G  basis  set  (this  is  also  true  for  other  basis  sets) 
from  a  Morokuma's  decomposition  has  a  completely  different  shape  than  the  total 
interaction  energy  though  they  have  similar  values  over  a  large  part  of  the  va¬ 
riation  domain  of  the  coordinate y.  It  can  be  noted  that  the  differences  in  E.  _ 

int 

energy  between  ST0-3G  and  3-21G  results  are  significant.  However,  when  y  is 

greater  than  180°,  the  distance  between  the  oxygen  of  methanol  and  one  of  the 

hydrogen  of  imidazole  becomes  shorter  than  2.5  &  and  other  components  of  E^  are 

then  important,  particularly  the  exchange  repulsion  term  :  E.  reaches  a  mini- 

ES  int: 

mum  which  does  not  exist  at  this  value  of  y  for  E.  .  Surprisingly,  the  electro- 
static  CNDO/D  energy  does  exhibit  a  minimum  at  this  y  value.  That  (CNDO/D) 

presents  a  minimum  long  before  the  ab  initio  correct  one  and  even  before  the 
total  interaction  SCF  energy  has  been  pointed  out  by  Sokalski  et  al.  (ref.  12) 
for  the  system  (H2O) 2  and  by  ourselves  for  the  system  11*0  +  formamide  (ref.  13). 
This  misbehavior  is  due  to  the  fact  that  the  fourth  term  of  eqn.  (5)  i" 


calculated  correctly  while  the  attractive  resultant,  sum  of  the  first  three 
terms,  is  obtained  within  the  CNDO/D  approximation.  This  drawback  of  the  method 
provides  however  a  good  equilibrium  value  of  y  in  the  case  of  the  larger  system 
methanol  +  (water-imidazole).  This  led  us  to  study  what  happened  when  many  inter- 
nuclear  distances  were  involved  in  the  relative  orientation  of  two  partners,  i.e. 
when  big  complexes  were  considered.  This  type  of  study  was  made  on  the  model 
active  site  of  a-chymotrypsin  made  up  of  22  amino  acids  (see  Table  1  of  ref.  5) 
(262  atoms)  and  the  substrate  N-acetyl-L-tryptophanamide  (33  atoms).  The  values 

E<5 

of  E.  (CNDO/D)  were  compared  with  the  total  interaction  energy  obtained  at  the 

ES 

CNDO/2  level.  For  comparison,  one  E^nt  (CNDO/D)  value  is  obtained  after  ca.  45  s 

on  a  VAX  11/780  whereas  one  E.  .  (CNDO/2)  calculation  lasts  oa.  5800  s  on  a 

mt 

FPS264  attached  processor  (38  mflops).  This  gain  of  time  is  obviously  appre¬ 
ciable. 

ES 

One  local  minimum  was  found  for  E.  .  (CNDO/D)  that  coincided  with  one  for 

mt 

^int  (CNDO/2)  and  the  energy  differences  between  this  point  and  two  neighbors 
were  very  similar  for  the  two  functions.  However,  further  investigation  is 

T?C 

needed  to  confirm  this  similarity  in  E.  .  (CNDO/D)  and  E.  _  (CNDO/2)  behavior. 

mt  mt 
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SURCOUF  -  Did  you  analyse  the  incidence  of  the  size  of  your  model  active  site  on  the 
consistency  of  your  results  ? 

DEHARENG  -  The  results  present  electrostatic  potential  maps  of  the  active  sites  of 
a-chymotrypsin  and  subtilisin.  An  extensive  study  was  made  in  the  case  of 
a-chymotrypsin  in  which  the  progressive  building  of  the  model  active  site  from  6  up  to 
19  amino  acids  clearly  indicated  the  general  tendency  of  the  negative  cloud  of  the 
electrostatic  potential  to  remain  localized  on  one  part  of  the  active  site.  Though  its 
detailed  shape  is  a  little  modified  from  one  model  to  the  other,  its  main  characteristics 
remained  quite  similar.  Thus  afterwards,  the  choice  of  the  number  of  amino  acids  taken 
into  account  in  subtilisin  was  made  on  the  same  footing  as  for  the  biggest  active  site 
model  of  a-chymotrypsin. 
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SUMMARY 

A  fundamental  problem  in  chemist™  and  biochemistry  of  proteins  is  understanding  the 
role  of  a  single  amino  acid  residue  in  determining  the  molecular  properties  of  the  protein. 
Structural  information  from  refined  X-ray  structures  of  several  species  of  the  enzyme 
dihydrofolate  reductase  with  the  bound  inhibitor  methotrexate  have  been  reported  and  "this 
provides  a  firm  experimental  basis  for  theoretical  investigation  of  changes  in  inhibitor 
binding  brought  about  by  a  single  amino  acid  substitution  of  the  parent  enzvme.  We  have 
applied  a  novel  methodology  consisting  of  combined  use  of  ab  initio  and  thermodynamic 
integration  methods  implemented  with  molecular  dynamics  to  determine  the  binding  free 
energies  of  methotrexate  and  the  natural  substrate  clihydrofolate  theoretically.  “Hydrogen 
boncf  adapted”  parametrization  of  potential  gives  encouraginly  good  agreement  with 
experimental  free  energy  values. 

INTRODUCTION 

Dihydrofolate  reductase  (DHFR:  tetrahydrofolate  dehydrogenase  E.c.1.5.1.3)  is  a  key 
enzyme  involved  in  biosynthetic  pathways  leading  to  the  production  of  the  purine  and 
pyrimidine  nucleotides.  Specifically,  it  catalyzes  the  NADPH  dependent  reduction  of 
7,8-dihydrofolate  (H2F).  This  enzyme  has  been  the  subject  of  intensive  investigation  for  over 
three  decades(ref.  1).  Such  continuing  interest  has  been  prompted  by  its  importance  as 
biological  target  for  a  large  class  of  drugs  -  the  antifolates. 

Methotrexate  (MTX)  is  a  potent  inhibitor  of  the  enzyme  and  still  one  of  the  most  widely 
used  antineoplastic  agents.  The  principal  difference  between  the  inhibitor  MTX  and  the 
natural  substrate  dihydrofolate  (H2F)  is  the  4-amino  group  on  MTX  which  replaces  the  4-oxo 
group  on  the  H2F.  It  appears  from  the  extensive  structure-activity  relationship  studies  on 
antifolates(ref.  2)  that  the  main  binding  characteristics  necessary  for  high  binding  constant  of 
the  inhibitor  (Kd  =  0.07  nM)  is  the  2,4  -  diamino  pyrimidine  ring.  Already  Baker(ref.  3) 
postulated  that  the  increased  basicity  of  the  pteridine  ring  achieved  by  the  4-amino 
substituent  permits  strong  interaction  with  acidic  groups  in  the  active  site  of  the  enzyme. 

This  interaction  between  DHFR  and  MTX  has  been  studied  well  both  experimentally 
(refs.  4-5)  and  theoretically  (refs.  6-8).  Significant  breaktrough  in  the  study  of  molecular 
basis  for  the  tight  binding  was  achieved  by  solving  for  the  crystal  structures  of  binary  complex 
DHFRtMTX  and  ternary  complexes  (with  NADPH  coenzyme)  to  high  resolution(refs.  9-10). 


Part  of  this  work  was  presented  at  the  International  Symposium:  Molecular  Recognition 
-  Its  Role  in  Chemistry  and  Biochemistry,  Sopron  (Hungary),  24-27.08.1988 
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These  structures  revealed  that  the  Asp  27  residue  of  the  wild  type  DHFR  forms  a  pair  of 
strong  hydrogen  bonds  with  N1  and  2-amino  groups  of  MTX(Fig.  1). 

ILE  5 


Fig.l:  Active  site  of  wild  type  DHFR  with  bound  MTX  (X-ray  geometry  from  ref.  9)  used  for 
ab  initio  calculations.  MTX  is  modelled  by  6-methyl, 2, 4-  diarhino  pteridine.  Similarly,  amino 
acid  residues  Ile5,Asp27,Iie94,Thrll3,  Wat403  and  Wat405  are  modelled  by  taking  their 
respective  hydrogen  bond  forming  parts  into  account. 

The  authors  have  also  proposed  that  the  substrate  is  flipped  over  when  bound  to  the  active 
site  of  DHFR  with  respect  to  the  position  of  MTX.  Recently,  this  tentative  model  was 
experimentally  proven  by  the  X-ray  study  of  Winkler  et  all. (ref.  1 1). 

Further  important  advances  in  understanding  the  mode  of  action  of  the  DHFR  enzyme 
came  from  the  solved  crystal  structures  of  Asp27->Ser27  and  Asp27->Asn27  mutants(ref. 
12)  obtained  by  site-directed  mutagenesis.  The  geometry  of  the  active  site  interactions  of 
mutant  enzymes  did  not  change  with  respect  to  the  wild  type  enzyme  except  for  changes  in 
hydrogen  bond  pattern  due  to  the  point  amino  acid  residue  mutations.  In  addition,  Kd  values 
were  determined  showing  that  while  MTX  binds  about  27  times  less  strongly  to  the  Asn27 
DHFR  (Kd=  1.9  nM),  the  Ser  27  mutant  has  a  Kd  =  210  nM  (ref.  12).  Another  interesting 
aspect  of  binding  to  mutant  enzymes  was  determined  by  the  difference  spectroscopy 
experiments.  The  state  of  protonation  of  inhibitor  MTX  has  changed  when  going  from  wild 
type  enzyme  to  mutants:  MTX  is  protonated  in  the  wild  type  enzyme,  while  when  bound  to  v 
either  mutant  it  is  not(ref.  13). 

Thus,  there  exists  a  wealth  of  experimental  data  on  this  enzyme  system  which  makes  it  a 
very  suitable  basis  for  theoretical  studies  of  correlation  of  the  structure  and  action  on  the 
molecular  level.  Such  correlations  require  knowledge  of  the  corresponding  relation  between 
structure  and  energy  (ref.  14).  Probably  the  most  important  factors  in  such  a  structure-activity 
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correlation  are  associated  with  electrostatic  interactions.  Electrostatic  interactions  which  are 
involved  in  hydrogen  bonding  and  salt  bridges  are  important  in  general  for  maintaining  the 
threedimensional  structure  of  proteins.  However,  when  these  structural  elements  are  present 
in  the  active  site  regions  of  enzymes  they  become  even  more  important  as  key  elements  in 
binding  of  substrates  and  inhibitors(ref.  15). 

In  order  to  achieve  a  quantitative  understanding  of  action  of  biological  molecules  one 
needs  the  ability  to  correlate  electrostatic  interactions  with  structural  information.  The 
difficulty  of  such  a  task  is  apparent  from  the  following  example:  the  electrostatic  free  energy 
of  a  point  charge  in  a  polar  solvent  is  about  -  80  kcal/mol  as  calculated  by  the  well  known 
Born  formula.  On  the  other  hand,  the  experimentally  determined  difference  in  binding  free 
energy  of  protonated  and  unprotonated  methotrexate  to  the  active  site  of  the  enzyme  DHFR 
is  only  about  1.8  kcal/mol. 

We  show  on  a  well  defined  system  DHFR:MTX  and  DHFR:H2F  how  the  hydrogen 
bonds  donated  by  specific  residues  of  the  protein  or  fixed  water  molecules  control  the 
inhibitor  and  substrate  binding.  The  mechanism  of  charge  stabilization  in  the  active  site  of 
DHFR  is  studied  by  ab  initio  quantum  mechanical  method  for  the  native  enzyme  and  the  two 
mutants  Asp27->Ser27  and  Asp27->Asn27  with  bound  MTX  and  H2F,  respectively.  The 
calculated  microscopic  picture  of  changes  in  electron  density  (refs.  17-19)  brought  about  by 
mutation  is  subsequently  used  in  the  statistical  mechanical  calculations  of  binding  free 
energy(refs.  20-22).  The  latter  has  been  successfully  used  for  calculations  of  inhibitor  binding 
and  catalyses  of  substrates,  however,  with  a  restriction  that  the  potential  force  field  used 
should  reasonably  describe  the  salient  features  of  the  process  energetics.  Here  we  use  the 
advantages  of  two  powerful  methods:  ab  initio  treatment  and  molecular  dynamics  approach 
to  calculate  small,  but  highly  significant  changes  in  the  enzyme  active  site. 

METHODS 

Thermodynamic  cycles  approach  and  basic  theory  of  free  energy  calculations  have  been 
extensively  described  in  literature  (refs.  20-22  and  references  therein).  Therefore  we 
reiterate  here  only  one  key  feature:  we  simulate  only  the  enzyme  ->  mutant  enzyme,  and 
enzyme  :  ligand  ->  mutant  enzyme  :  ligand  systems  and  thus  avoid  the  more  inaccurate  (  at 
least  in  case  of  charged  amino  acid  residues)  solution  simulation  (ref.  7).  Below  we  give  a 
summary  of  computational  details. 

Ab  initio  calculations  were  performed  with  standard  3-2 1G  basis  set  and  using  the 
Monstergauss  package  (ref.  23).  Each  system,  describing  the  enzyme  active  site  with  a  bound 
ligand  consisted  of  45-50  atoms  described  by  approximately  250  basis  functions.  In  order  to 
test  the  approximation  we  have  chosen,  i.e.  how  well  the  active  site  model  describes  the  actual 
situation  in  the  enzyme,  we  have  also  incorporated  the  charge  distribution  of  the  enzyme  and 
crystal  water  molecules  directly  into  the  ab  initio  Hamiltonian  by  using  point  charges  from 
the  molecular  dynamics  residue  library  (ref.  24).  Approximatelly  2100  point  charges  were 
included  in  the  ab  initio  Hamiltonian,  Electron  density  maps  were  computed  on  a  grid  of 
approximately  1000  points. 
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Molecular  dynamics  simulations  were  done  (ref.  24)  on  systems  of  about  1550  enzyme 
atoms  using  the  united  atom  approximation  and  about  200  water  molecules.  Wall  region  of 
radius  18  A  centered  around  N1  atom  of  MTX  and  N3  atom  of  H2F,  respectively,  was 
employed  with  the  nonbond  cutoff  distance  of  9  A.  Atoms  outside  this  regin  were  kept 
frozen.  Energy  minimization  was  followed  by  a  short  MD  trajectory  to  allow  the  system  to 
reach  equilibrium  at  300  K  and  then  the  amino  acid  in  question  was  subject  to  mutation  for 
*■-  the  native  enzyme,  and  enzyme:inhibitor  (substrate)  complex,  respectively.  Geometry  of  the 

*•  enzyme:substrate  complex(Fig.  2)  has  been  modelled  after  the  X-ray  data  of  Winkle*-  et  all. 

(ref.  11).  The  molecular  dynamics  simulation  implemented  with  the  thermodynamic 
j  integration  technique  was  performed  using  isothermal-  isobaric  conditions. 


Fig.2  Active  site  of  wild-type  DHFR  with  bound  substrate  (6-methyl, 7, 8-dihydropterin)  Same 
asinFig.l. 

Weak  coupling  to  an  external  temperature  bath  of  300  K  with  a  coupling  constant  of  0.4 
ps  and  to  an  external  pressure  bath  of  1.0  bar  with  a  coupling  time  constant  of  0.5  ps  was  used 
to  maintain  temperature  and  pressure  in  all  systems.  The  leap  frog  algorithm  was  used  wuth 
velocity  and  coordinate  rescaling  to  accomplish  the  weak  coupling.  The  model  potential  for 
the  water  intermolecular  interactions  used  was  the  simple  point  charge  (SPC)  developed  by 
Berendsen  et  all  (ref.  25).  It  has  been  shown  to  describe  well  the  equilibrium  pioperties  and 
free  energies  of  hydration  in  the  molecular  dynamics  simulation  of  ionic  hydration  (ref.  26). 
The  intramolecular  degrees  of  freedom  were  treated  as  separable  constant  using  the  SHAKE 
coordinate  resetting  procedure.  The  time  step  used  was  0.002  ps.  Free  energy  calculations 
i  were  (lone  “forward”  and  “backward”  in  20  ps.  This  proved  to  be  sufficient  because  of  the 

linearity  of  the  free  energy  change  dependence  on  time  (ref.  27).  We  used  the  nonlinear 
f  parametrization  proposed  by  Cross  (ref.  25)  for  the  Van  der  Waals  potential  term  and  ab 

initio  determined  charge  distribution  parameters. 


RESULTS  AND  DISCUSSION 


The  results  of  proton  potential  calculations  with  the  ab  initio  method  are  given  in  Tables  1 
and  2  for  the  MTX  and  H2F,  respectively.  The  plot  of  the  potential  for  the  MTX  complex 
against  the  Nl-H  distance  is  given  in  Fig.3.  Potential  energy  of  the  proton  in  the  hydrogen 
bond  of  the  wild  type  enzyme  shows  that  the  salt  bridge  structure  is  energetically  about  19 
kcal/mol  more  favourable  than  neutral  hydrogen  bond  with  proton  at  Asp27  oxygen.  This 
figure  is  reduced  to  17.5  kcal/mol  when  all  protein  and  crystal  water  charges  are  taken  into 


R  [A] 

Fig.3:  Proton  potential  for  the  proton  (H22)  in  the  hydrogen  bond  between  N10  atom  of 
pteridine  ring  and  respective  side  chain  atom  of  residue  27.  (a;  MTX:WT  hydrogen  bond  is  to 
OD2  atom  ofAsp27  (0)  MTX:ASN  hydrogen  bond  is  to  ND2  of  mutant  Asn27  (c)  MTXiSER 
hydrogen  bond  is  to  OG  of  mutant  Ser27.  R  is  the  distance  from  N 10  atom  of  MTX  in  A. 

TABLE  1 

Energetics  of  proton  transfer  in  the  hydrogen  bond  N1  (MTX)  -  residue  27  (DHFR).  Energy 
values  are  relative  to  the  minimum  of'proton  potential  function.  Units:  kcal/mol 


R(N1-H) 

[A] 

E 

[kcal/mol] 

Asp  27 

TH2 

0. 

1.61 

19.0 

Asp  27 

1.02 

0. 

with  p.p.c. 

1.61 

17.5 

Asn  27 

1.02 

21.3 

1.86 

0. 

Ser27 

1.02 

27.9 

1.87 

0. 
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Our  result  can  be  compared  with  recently  published  calculation  by  Singh  (ref.  7).  His 
simple  model  included  MTX  with  Asp27  residue  only  and  the  potential  energy  minimum  was 
found  to  be  on  the  neutral  Asp  side.  The  preference  for  a  salt  bridge  in  the  guanidinium 
moiety  bound  to  carboxylate  was  found  previously  for  a  series  of  substituted  guanidinium 
isosteres(refs.  27-28)  mimicking  the  H2-receptor  antagonists,  however  in  this  case  the  energy 
difference  between  the  two  potential  'minima  was  significantly  smaller(about  6  kcal/mol). 
Clearly,  other  amino  acid  residues  of  the  active  site  forming  hydrogen  bond  network  around 
the  inhibitor  are  of  importance.  This  is  further  corroborated  by  the  fact  that  the  binding  of 
MTX  toThrll3  ->  Vail  13  mutant  is  about  30  times  less  strong  than  is  the  case  with  the  wild 
type  enzyme(ref.  31).  In  Table  2  the  proton  potential  for  the  substrate  in  the  three  complexes 
is  given(ref.  27).  In  the  wild-type  DHFR  the  potential  minimum  is  much  more  shallow  and, 
in  fact, (not  shown  here  for  the  lack  of  space)  also  of  a  double  minimum  type. 

TABLE  2 

Energetics  of  proton  transfer  in  the  hydrogen  bond  N3  (HF)  -  residue  27  (DHFR).  Energy 
values  are  relative  to  the  minimum  of  proton  potential  function.  Units:  kcal/mol 


R  ( N3-H ) 

[A] 

E 

[kcal/mol] 

Asp  27 

1.02 

0. 

1.61 

9.1 

Asn  27 

1.02 

28.5 

1.86 

0. 

Ser  27 

1.02 

29.5 

1.87 

0. 

Electron  density  difference  maps  are  not  presented  here  (ref.  27)  but  provide  further 
evidence  that  some  of  the  eletron  density  changes  resulting  from  proton  transfer  in  the 
hydrogen  bond  (Tables  1  and  2)  are  of  nonlocal  character.  Therefore,  it  appears  to  be 
necessary  to  describe  the  local  electron  densities  in  terms  of  electron  distributions,  well 
equilibrated  on  a  larger  area,  possibly  on  the  whole  active  site.  We  have  designated  such 
electron  distributions  as  “hydrogen  bond  adapted”. 

TABLE  3 

Calculated  free  energy  changes  for  the  inhibitor  binding.  Units:  kcal/mole. 


aG 

aaG 

AAGExp. 

E  WT 

-  > 

E  Asn  -7.9 

E  wt  :MTX 

-  > 

Eash:  MTX -10.0 

2.1 

1.8 

Ewt 

—  > 

E  Ser  -7. 1 

E  wt  :  MTX 

-  > 

E Ser:  MTX  -11.9 

4.8 

4.4 

V. 


I 


573 

Finally,  in  Tables  3  and  4  the  resulting  free  energy  changes  for  bound  MTX  and  H2F  , 
respectively,  are  given.  It  is  encouraging  that  differences  in  the  charge  stabilization  by  the 
enzyme  active  sute  can  be  described  consistently  within  the  ab  initio  formalism.  Correct 
representation  of  this  interaction  then  leads  to  better  description  of  free  energy  changes  by 
statistical  mechanics. 

TABLE 4 

Calculated  free  energy  changes  for  the  substrate  binding.  Units:  kcal/mole 


AG 

aaG 

AAGExp. 

E  WT 

—  > 

E  Asn 

-7.9 

Ewt  : 

—  > 

E  Asn :  H2F 

-10.3 

2.4 

2.2 

E  \VT 

—  > 

Eser 

-7.1 

E  wt  :  H2F 

-  > 

E  Ser  :  H2F 

-10.4 

3.3 

2.8 

CONCLUSIONS 

We  have  applied  a  novel  methodology  to  determine  binding  constants  of  natural  susbtrate 
dihydrofolate  and  methotrexate  inhibitor  to  the  three  E.  Coli  DHFR  enzymes:  wild  type, 
Asn27  and  Ser27  mutants. 

Use  of  ab  initio  calculated,  “hydrogen  bond  adapted”  charge  parameters  for  Coulomb 
potential  in  the  thermodynamic  integration  method  implemented  with  molecular  dynamics 
gives  encouraginly  good  agreement  with  experiment.  Our  parametrization  of  the 
Hamiltonian  seems  to  lead  to  linear  time  dependency  of  the  free  energy  changes.  Also,  by 
taking  into  account  all  protein  active  site  amino  acid  residues  participating  in  the  hydrogen 
bonding  network  with  the  ligand  molecule  on  the  ab  initio  level  we  were  able  to  describe  the 
subtle  energetic  differences  between  the  hydrogen  bonded  mutant  enzyme  complexes  with 
ligands  and  salt  linked  wild-type  DHFR  complex.  This  approach  seems  to  be  promising  for 
general  use  in  description  of  free  energy  changes  upon  binding  in  protein  active  sites  where 
ligands  often  form  hydrogen  bonded  networks  with  the  protein. 
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SUMMARY 

The  use  of  internal  and  helicoidal  coordinates  for 
modelling  biopolymers  is  illustrated  for  the  case  of  double 
stranded  deoxyribonucleic  acid.  This  approach  enables 
conformational  changes,  junctions  and  fine  base  sequence  effects 
to  be  studied  in  a  controlled  way.  A  new  approach  to  dealing  with 
solvent  effects  is  also  discussed. 

INTRODUCTION 

The  study  of  the  properties  and  interactions  of 
deoxyribonucleic  acid  is  a  subject  of  continually  growing 
importance.  Over  the  last  10  years,  developments  in 
oligonucleotide  synthesis  and  subsequent  single  crystal  x-ray 
studies  have  shown  DNA  structure  to  be  strikingly  more  complex 
than  was  previously  suspected.  Fine  structure  as  a  function  of 
base  sequence  is  now  known  to  play  an  important  role  in  protein- 
nucleic  acid  and  drug-nucleic  acid  recognition  and,  in  particular, 
long  range  affects,  mediated  by  sequence  dependent  axis  curvature, 
have  come  to  light.  Moreover,  the  biological  role  of  unusual 
conformations  such  as  the  left-handed  Z  form,  cruciforms  and 
triple  stranded  segments  has  been  demonstrated. 

Reliable  computer  modelling  of  large  fragments  of  DNA  is 
thus  of  considerable  importance.  However,  such  studies  present  a 
number  of  problems  which  have  not  yet  been  fully  solved.  Firstly, 
in  common  with  studies  of  other  biopolymers,  DNA  modelling 
requires  the  treatment  of  systems  containing  1000  atoms  or  more. 
Secondly,  the  poly-ionic  nature  of  DNA  makes  it  remarkably 
sensitive  to  environmental  factors.  Finally,  many  interesting 
biological  processes  concerning  DNA  clearly  require  its 
flexibility  to  be  taken  into  account  and  often  involve  large 
conformation  changes. 
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Our  approach  to  these  problems  has  been,  firstly,  to 
develope  a  series  of  algorithms  (refs.  1-4)  which  enable  DNA 
fragments  to  be  modelled  with  less  variables  than  are  required  by 
the  classical  molecular  mechanics  or  dynamics  procedures.  Through 
the  use  of  helicoidal  and  internal  variables  we  are  now  able  to 
study  a  DNA  duplex  with  10  times  fewer  variables  or,  in  the  case 
of  imposed  helical  symmetry,  100  times  fewer  variables  than  would 
otherwise  be  necessary.  Moreover,  the  direct  use  of  helicoidal 
variables  enables  us  to  provoke  conformational  changes  in  a  fully 
controlled  way  and  thus  to  investigate  their  energetics. 

The  present  article  summarizes  these  methodological 
developments,  concentrating  on  our  most  recent  algorithm  (ref.  4) 
termed  JUMNA  (Junction  Minimisation  of  Nucleic  Acids) ,  as  well  as 
describing  some  of  the  applications  that  we  have  been  able  to 
carry  out  for  duplex  DNA.  We  also  describe  the  first  step  in  the 
next  stage  of  our  study  which  is  aimed  at  achieving  a  more 
realistic  model  of  the  solvent  and  counterion  environment  which 
surrounds  DNA  within  the  cell.  This  development,  termed  FIESTA 
(Field  Integrated  Electrostatic  Approach) ,  enables  the  crude 
dielectric  damping  usually  employed  in  simulation  studies  to  be 
replaced  by  a  greatly  improved  representation  of  the  influence  of 
a  continuum  dielectric  surrounding  the  van  der  Waals  envelope  of  a 
macromolecule . 

METHODS 

Energy  formulation 

The  first  step  in  modelling  DNA  behavior  is  to  formulate 
an  energy  functional  which  will  describe  as  well  as  possible  its 
internal  stability.  Computational  limits  are  nevertheless  imposed 
on  this  functional  by  the  size  of  the  systems  we  will  treat  which 
often  contain  more  than  1000  atoms.  Within  these  limits  we  have 
tried  to  develope  a  functional  which  is  as  precise  as  possible.  We 
have  paid  particular  attention  to  electrostatic  interactions  and 
also  to  hydrogen  bond  formation  which  plays  an  important  role  both 
within  DNA  and  for  its  interaction  with  other  molecules.  The 
functional  we  employ,  termed  "Flex"  (refs.  2,3,5)  is  given  below: 
E=  2  QiQj/  e(R)Rij  +  2(  -Aij/Rij6  +  Bij/Rij12  ) 

+  2  (  Cos  9(  -AijHB/Rij6  +  BijHB/Rij12  ) 

+  (1-Cos  6)  (  -Aij/Rij6  +  Bij/Rij12  )  ] 

+  2  Vs/2  (  1  ±  Cos  NsTs  )  +  2  Va  (  ^-“a')2 
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This  expression  consists  of  a  series  of  pairwise  additive  terms. 
The  first  term  represents  the  electrostatic  energy,  calculated  as 
a  sum  of  interactions  between  atomic  monopoles  and  damped  by  a 
dielectric  function  e(R)  dependent  on  the  interaction  distance. 
The  following  three  terms  represent  dispersion-repulsion 
interactions  calculated  with  a  6-12  dependence  using,  as  a  basis, 
the  parameter  set  developed  by  Zhurkin  et  al  (ref.  6) .  Hydrogen 
bonds  are  dealt  with  by  the  latter  two  of  these  terms >  which  take 
into  account  their  angular  dependence.  These  parameters  were 
determined  by  ab  initio  SCF  calculations  on  model  systems  for  each 
type  of  bond.  All  of  these  terms  are  summed  over  pairs  of  atoms 
separated  by  at  least  three  chemical  bonds.  The  last  two  terms 
represent  the  distortion  energy  associated  with  torsion  angles  Ts 
(including  anomeric  effects)  and  valence  angles  ua  which  were 
derived,  from  experimental  values  or  again  obtained  by  ab  initio 
calculations  on  model  systems. 

The  atomic  monopoles  we  employ  are  calculated  by  the 
Huckel-Del  Re  technique  with  a  set  of  parameters  specially 
developed  for  the  nucleic  acids.  These  parameters  were  derived  by 
fitting  the  monopole  potentials  and  fields  on  surface  envelopes 
surrounding  the  nucleic  acid  sub-units  to  the  exact  values 
obtained  by  ab  initio  SCF  calculations  (ref.  5) .  Particular  care 
was  taken  to  reproduce  correct  charge  distribution  between  these 
sub-units  by  explicit  ab  initio  calculations  on  nucleosides  and  S- 
P-S  and  P-S-P  fragments.  The  dielectric  function  we  have  used  in 
the  studies  presented  here  is  based  on  the  form  proposed  by 
Hingerty  et  al  (ref.  7)  .  We  have  reformulated  this  function  as 
shown  below  so  that  it  is  possible  to  vary  both  the  plateau  value 
of  the  dielectric  reached  at  long  distance  (D)  and  the  slope  of 
the  sigmoidal  segment  of  the  function  (S) . 
e(R)=  D  -  (D-l)/2  [  (RS)2  +  2RS  +  2  ]  exp  (-RS) 

Trial  calculations  on  the  different  conformational  forms  of  DNA 
showed  the  original  distance  dependence  proposed  by  Hingerty, 
(corresponding  to  a  plateau  value  of  80  and  a  slope  of  0.356)  with 
net  phosphate  charges  reduced  to  -0.5e  led  to  good  results  for 
high  damping  conditions,  notably  for  the  Z  conformation.  However, 
for  the  A  and  B  conformations  a  lower  slope  of  0.16  led  to  the 
best  overall  results. 
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Using  internal  coordinates 

The  simplest  way  to  define  the  conformation  of  a 
molecular  system  for  computer  study  is  in  terms  of  cartesian 
coordinates,  leading  to  a  total  of  3N  variables  for  N  atoms. 
Although  this  technique  is  used  in  conventional  molecular 
mechanics  it  leads  to  considerable  minimisation  problems  for 
biological  macromolecules  which  will  generally  be  represented  by 
several  thousand  variables.  If  one  wants  to  reduce  the  time 
necessary  for  optimisation  and  also  to-  reduce  the  problems  caused 
by  local  minima  in  very  high  dimensional  spaces  the  only 
possibility  is  to  treat  fewer  variables  and  to  find  variables 
which  describe  molecular  motion  more  effectively. 

This  can  be  done  in  a  chemically  meaningful  way  by 
adopting  internal  coordinates:  bond  lengths,  valence  angles  and 
dihedral  angles.  Directly  using  such  variables  simplifies  the 
treatment  of  the  conjugated  motion  of  atoms,  corresponding,  for 
instance,  to  rotation  around  a  single  bond.  A  further  advantage 
of  this  approach  is  the  possibility  of  fixing  any  of  the  internal 
variables  during  minimisation,  notably  bond  length  variations. 
This  can  be  an  powerful  way  of  simplifying  the  model  since  such 
variations  are  generally  associated  with  considerably  higher  force 
constants  than  either  valence  or  torsion  angles. 

By  dramatically  reducing  the  number  of  variables 
necessary  to  represent  a  macromolecular  system,  this  approach  can 
be  used  to  study  much  larger  conformational  changes  than  would 
otherwise  be  possible.  This  formulation  can  also  be  extended  to 
the  description  of  polymers  and,  in  particular  DNA,  directly  in 
terms  of  helicoidal  parameters.  This  gives  us  the  possibility  of 
easily  controlling  important  aspects  of  their  conformation  and,  as 
we  shall  show,  of  either  imposing  helicoidal  symmetry  or  of 
inducing  chosen  types  of  conformational  distortion. 

However,  one  problem  arises  when  using  internal 
coordinates  in  the  presence  of  geometrical  constraints  implied  by 
ring  closure  or  helicoidal  parameters.  In  these  cases,  internal 
coordinates  represent  an  over-description  of  the  molecular  system. 
Certain  variables  become  linked  by  equations  which  are  usually 
coupled,  non-linear  and  difficult  to  solve  analytically.  This 
problem  can  nevertheless  be  overcome  and  we  will  illustrate  this 
in  the  case  of  our  most  recent  algorithm  termed  JUMNA. 
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JUMNA- (Junction  Minimisation  of  Nucleic  Acids) 

The  aim  of  our  most  recent  methodology,  JUMNA  (ref.  4), 
is  to  combine  the  control  over  the  helicoidal  parameters  of  DNA 
.with  an  -effective  treatment  of  the  constraint  problem.  In  order 
to  achieve  this  we  divide  the  nucleic  acid  fragment  studied  into 
3 ' -monophosphate  nucleotides  which  are  positioned  with  respect  to 
an  axis  using  helicoidal  parameters-.  These  parameters,  combined 
with  backbone  torsions  and  sugar  valence  angles  become  the 
variables  of  our  model.  A  constraint  minimizer  is  then  used  to 
satisfy  a  total  of  4  constraints  per  nucleotide:  the  sugar  ring 
closure  distance,  a  backbone  closure  distance  between  05'  and.  05', 
and  two  valence  angles  P-05 ' -C5 •  and  05 ' -C5 f -C4 ' . 

This  approach  has  several  notable  advantages.  Firstly, 
the  " junctions"  between-  consecutive  nucleotides  do  not  have  to  be 
closed  for  the  starting  structure  and,  consequently,  it  is  not 
necessary  to  find-  the  appropriate  nucleotide  conformations 
corresponding  to  a  chosen  set  of  helicoidal  parameters  before 
beginning  energy  optimisation.  This  greatly  simplifies  the 
investigation  of  unusual  and  irregular  nucleic  acid  structures  for 
which  no  conformational  data  exists.  Secondly,  "junctions" 
between  nucleotides  can  open  during  optimisation  allowing  for 
passage  between  conformational  states  which  would  otherwise  be 
separated  by  large  energy  barriers.  Thirdly,  it  is  not  necessary 
to  develope  and  solve  the  complicated  equations  describing  the 
"closure"  of  the  constrained  system.  Fourthly,  the  direct  use  of 
helicoidal  variables  allows  overall  deformations  such  as 
stretching,  twisting  and  bending  to  be  carried  out  in  a  controlled 
way.  It  can  finally  be  remarked  that  it  is  easy  to  impose  symmetry 
on  the  DNA  fragments  studied  by  simply  using  a  single  variable  in 
place  of  the  corresponding  helicoidal  parameters  of  a  set  of 
nucleotides.  In  this  way,  our  present  version  of  JUMNA  allows  the 
imposition  of  mononucleotide  or  dinucleotide  repeat  symmetry  and 
can  treat  up  to  4 -stranded  systems. 

The  JUMNA  algorithm  thus  allows  easy  construction  and 
energy  optimisation  of  both  regular  and  irregular  DNA  oligomers. 
The  small  number  of  variables  necessary,  the  speed  of  optimisation 
and  the  control  over  the  final  conformation  make  this  approach 
very  interesting  for  modelling  a  wide  variety  of  biologically 
important  problems  involving  the  nucleic  acids.  It  is  finally 
remarked  that  we  have  also  recently  developed  an  algorithm 
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(CURVES,  refs.  8,9)  which  allows  the  optimal  helical  axis  and  a 
full  helicoidal  parameter  set  to  be  obtained  for  irregular  DNA 
conformations.  This  tool  is  of  considerable  help  in  interpreting 
sequence  effects  within  irregular  structures  in  a  clear  and 
rigorous  way. 

RESULTS  -AND  DISCUSSION 

YiO  Deformation -of  Symmetric  DNA  -duplexes 

We  begin  by  looking  at  the  flexibility  of  energy 
optimised  B-DNA  and  A-DNA  duplexes  with  respect  to  various  types 
of  regular  distortion.  We  will  consider  three  changes  with  respect 
to  helicoidal  parameters:  twist  or  winding  angle  (WDG) ,  base  pair 
rise  (ZSH)  and  base  pair  displacement  along  the  dyadic  axis  (XSH) , 
and  also  one  backbone  variable,  sugar  pucker  phase  angle  (PHA) . 
The  first  two  of  these  distortions  are  represented  graphically  in 
figures  1  and  2  for  the  B  and  A  conformations  of  DNA.  In  each  case 
two  homopolymeric  sequences  have  been  considered,  poly(dG) . 
poly(dC)  and  poly(dA) .poly(dT) .  From  these  results  it  is 
immediately  obvious  that  the  B  conformation  is  much  more  flexible 
and  this  fact  is  confirmed  for  the  other  types  of  distortion  in 
table  1  where  the  sharpness  of  the  distortion  energy  curves  is 
measured  by  their  half -width  at  0.5RT  (0.3  kcal/mole)  above  the 
minimum.  It  can  also  be  seen  that,  for  B-DNA,  the  AT  sequence  is 
also  somewhat  stiffer  than  the  GC  sequence  while  for  A-DNA 
sequence  effects  are  less  marked. 


TABLE  1 

Half-width  of  distortion  curves  0.5RT  above  the  minimum 


Form 

Sequence 

WDG ( ° ) 

ZSH (A) 

XSH (A) 

PHA ( ° ) 

B 

GC 

4.2 

0.4 

1.5 

14.0 

B 

AT 

3.3 

0.2 

1.1 

7.0 

A 

GC 

0.3 

0.05 

0.11 

2.0 

A 

AT 

0.4 

0.07 

0.09 

2.0 

In  geometrical  terms,  the  deformation  of  the  two 
allomorphic  forms  studied  occurs  in  different  ways.  In  B-DNA, 
helicoidal  deformation  is  generally  absorbed  by  coupled  changes  in 
other  helicoidal  parameters,  while  for  A-DNA  the  adaption  mainly 
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Figure  1.  Deformation  energy  (kcal/mole)  of  B-  and  A-DNA  with 
respect  to  Twist  (WDG) 
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Figure  2.  Deformation  energy  (kcal/mole)  of  B-  and  A-DNA  with 
respect  to  rise  (ZSH) 
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takes  place  through  rearrangement  of  the  sugar  phosphate  backbone. 
One:  exception  to  this  occurs  for  base  pair  rise  which,  for  A-DNA, 
leads  to.  a  rapid  decrease  in  tilt  as  rise  increases.  Overall,  it 
appears  that  the  stiffness  and  the  poor  coupling  of  helicoidal 
parameter  changes  in  the  A  conformation  not  only  lead  to  its 
overall  rigidity,  but  also  diminish  the  influence  of  base  sequence 
on  its  properties. 

(ii^  Variable  correlations  and  the  B-A  transition 

When  the  conformations  resulting  from  the  distortions 
discussed  above  are  grouped  together  it  is  possible  to  detect  a 
number  of  general  correlations  between  different  variables.  In  the 
case  of  B-DNA  we  find  6  linear  correlations':  (C3'-C4')  :  WDG, 
(C3'-C4')  :  -(03'-P),  (C3  '-C4 ' )  :  GLY  (glycosidic  angle) ,  (03'-P) 
:  -XSH,  ZSH  :  TLT  (tilt  angle)  and  XSH'  -WDG.  Of  these,  the  first 
three  have  been  observed  crystallographically  for  B-DNA  oligomers 
(refs.  10,11).  In  the  case  of  A-DNA  we  find  4  correlations:  (03'- 
C4'j  :  GLY,  (P-05')  :  -(C5'~C4'),  ZSH  :  -TLT  and  TLT  :  -GLY. 
Again,  the  first  of  these  correlations  has  been  observed 
crystallographically.  Considering  the  variety  of  ways  in  which  the 
deformations  we  have  imposed  could  be  absorbed  it  is  encouraging 
to  find  these  agreements  with  experimental  data.  Overall,  this 
would  seem  to  indicate  that  our  reduced  variable  model  of  DNA 
coupled  with  the  "Flex"  energy  formulation  is  relatively 
successful  in  describing  DNA  mechanics. 

It  is  clear  from  the  results  presented  that  the  A  and  B 
conformations  truly  represent  two  distinct  conformational  forms  of 
the  DNA  double  helix.  This  is  particularly  clear  for  the 
correlations  discussed  above  which  are  generally  quite  different 
for  each  allomorphic  form.  The  one  exception  to  this  rule  occurs 
for  the  correlation  between  sugar  puckering  (closely  linked  to  the 
C3 '-C4 '  dihedral  angle)  and  the  glycosidic  torsion.  These 
variables  are  coupled  in  the  same  way  for  both  the  A  and  B  forms, 
despite  the  fact  that  the  optimum  value  of  each  variable  is  very 
different  in  the  cwo  allomorphs.  This  observation  led  us  to  search 
for  a  simple  transition  pathway  from  the  B  to  the  A  form.  While 
forced  sugar  repuckering  did  not  lead  to  an  appropriate 
conformational  change,  reduction  of  the  glycosidic  angle  resulted 
in  an  easy  transition  from  the  B  to  the  A  form  with  virtually  no 
energy  barrier,  maintaining  a  helically  symmetric  conformation  at 
each  step.  This  pathway,  illustrated  in  figure  3,  is  one  example 
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of  how  control  over  the  helicoidal  and  internal  variables  of  DNA 
enables  us  to  investigate  very  large  conformational  changes  in  a 
controlled  way  and  with  only  limited  computational  expense. 

(iii)  B-Z  junctions  and  a  possible  transition  mechanism 

Since  experimental  evidence  suggests  that  the  B  to  Z 
transition  takes  place  via  a  junction  moving  along  the  double 
helix,  we  began  our  studies  in  this  area  by  creating  a  B-Z 
junction  model  (ref.  4)  .  Starting  from  two  3-nucleotide  pair 
fragments  in  relaxed  B  and  Z  conformations,  we  were  able  to  profit 
from  the  helicoidal  variable  construction  used  by  JUMNA  to  form 
two  types  of  junction  without  any  base  pair  disruption.  The  two 
junctions,  corresponding  to  5'(CpG)3'  and  5'(GpC)3'  nucleotides  at 
the  B-Z  interface  (the  overall  sequence  being  respectively  CGCGCG 
and  GCGCGC) ,  both  required  a  6A  shift  of  the  Z  fragment  towards 
the  minor  groove  side  of  the  B  duplex.  After  energy  optimisation 
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these  junctions  led  to  very  different  conformations.  While  the  CpG 
junction  resulted  in  a  large  rise  at  the  B-Z  interface  and  a  Jcink 
of  roughly  40“  towards  the  major  groove  side  of  the  B  duplex,  the 
GpC  junction  kinked  by  almost  30°  in  the  opposite  direction  and 
was  able  to  insert  the  5' -3'  backbone  of  the  Z  segment  into  the 
minor  groove  of  the  B  helix;  As  we  have  discussed  in  an  earlier 
publication  (ref.  4)  the  reason  for  the  junction  kinks  can  be 
explained  by  the  rotation  of  the  base  pairs  around  their  long  axis 
(right-handed  with  respect  to  the  G-»C  vector)  during  the  B-Z 
transition.  This  rotation,  coupled  with  fact  that  the  cytidine 
nucleotide  remains  anti,  leads  to  a  stretched  sugar-phosphate 
linkage  at  the  interface  which  can  be  relaxed  by  kinking  in  the 
directions  observed.  Comparing  the  energy  of  these  structures  with 
the  mean  energy  of  a  6-nucleotide  pair  B  or  Z  helix  gave  the 
formation  energies  as  19.3  kcal/mole  for  CpG  and  only  10.3 
kcal/mole  for  GpC.  These  structures  are  represented  in  the 
stereodiagrams  figures  4  and  5. 

Starting  from  these  structures  it  was  possible  to 
construct  a  transition  pathway  which  led  the  uppermost  B-DNA  pair 
to  fully  rotate  and  become  part  of  the  Z  fragment.  This 
conformational  change,  which  was  accomplished  by  profiting  from 
the  possibility  of  locking  the  propeller  twists  within  the  JUMNA 
algorithm,  is  illustrated  in  figure  6.  It  should  be  noted  that 
this  pathway  did  not  involve  any  base  pair  opening  and 
corresponded  to  an  energy  barrier  of  roughly  25  kcal/mole.  The 
fact  the  junctions  at  either  end  of  this  transition  pathway  were 
kinked  in  opposite  directions  turned  out  to  favour  the  transition 
mechanism,  since  stacking  energy  between  the  rotating  pair  at  the 
interface  and  the  Z  segment  could  be  regained  more  rapidly  as  the 
kink  direction  changed  than  if  the  junction  had  remained 
straight. 

fiv)  Sequence  effects  on  DNA  conformation 

The  studies  we  have  carried  out  to  date  (refs.  12,13) 
have  led  to  the  detection  of  important  and  reproducible  base 
sequence  effects  on  the  conformations  of  B-DNA,  A-RNA,  Z-DNA  and 
Z-RNA.  For  a  detailed  discussion  of  the  effects  within  the  Z 
conformation  and  the  observed  correlations  with  chemical 
reactivity  studies,  the  reader  is  referred  to  another  article  in 
the  present  volume  (ref.  14).  We  will  limit  ourselves  here  to  the 
latest  results  involving  the  B  conformation  of  DNA. 
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Figure  6.  B-Z  transtion  pathway  involving  a  passage  from  a  CpG  ; 

junction  (top  left)  to  a  GpC  junction  (top  right)  by  the  rotation  ; 

of  the  B  pair  at  the  interface.  j 

S 

Our  initial  studies  of  B-DNA  led  to  a  number  of  striking  ! 
sequence  effects  involving  changes  in  twist  of  almost  12°  and  \ 

changes  in  rise  of  more  than  lA.  Tilt  also  varied  widely  with  base 
composition  notably  yielding  a  large  negative  value  for 
homopolymeric  G  tracts,  while  propeller  twist  was  large  and  : 

positive  for  A  tracts.  The  twist  results  also  appeared  to  show  a  ; 

good  correlation  with  the  values  deduced  by  Kabsch,  Sander  and  } 

Trifonov  (ref.  15)  by  fitting  to  experimental  data.  However,  * 

following  deformation  studies  of  these  optimal  geometries  it  was 

•  •  •  •  4  ,  ^ 

found  that  other  minima,  which  we  will  term  B  ,  corresponding  to  a  j 
partial  transition  to  the  A  conformation  existed  and  were,  with  i 

,  A 

one  exception,  more  stable  than  the  true  B  forms.  | 


-t 


587 


TABLE  2 

Conformation  arid  energy  changes  (kcal/mole)  upon  passing  from  the 
B  -to  the  B*  conformation  as  a’  function  of  base  sequence 


Sequence 

Sugar  B  Twist  B  Sugar  B*  -Twist  B*  AE  (B^B* 

) 

(GC)ri 

C2 '  erido 

GpOCpG 

Gs 02 'endo  GpCcCpG 

1.3 

-  ;i 

0:01-' endo 

(AT)  n 

C2 ' endo 

ApT>TpA 

As  02 'endo  ApTcTpA 

0.7 

T: 01 'endo 

(GT)  n 

02''  erido 

GpT>TpG 

C :01 'endo  GpTcTpG 

3.8 

Others : C2 ' endo 

(CT)  n 

02 ' endo 

CpT=TpC 

Cs 01 'endo  CpTcTpC 

1.8 

> 

Others s 02 'endo 

r 

(AA)  n 

02 ' endo 

AsC2'endo 

3.0 

.  > 

TsC3'endo 

t 

(GG)  n 

C2 ' endo 

GsC2 'endo 

-1.6 

V 

Cs 03 'endo 

The  results  listed  in  table  2  show  that  these  new  conformations 
resulted  from  a  transition  of  most  of  the  pyrimidine  sugars  to 
north  puckers  (Ol'endo  or  03 'endo)  and  also  involved  inversions  in 
the  dinucleotide  features  we  had  previously  observed.  This  is 
notably  the  case  for  twist  variations  within  dinucleotide  repeat 
sequences  as  illustrated  in  the  table.  It  was  also  discovered  that 
these  new  intermediate  conformations  were  stiffer  with  respect  to 
helicoidal  deformation,  in  line  with  the  results  found  for  A-DNA 
and  discussed  in  section  (i).  We  believe  that  the  existence  of 
these  forms  is  linked  to  the  fact  that  the  A  conformation  is 
intrinsically  more  stable  than  the  B  conformation  using  our 
simplified  dielectric  model.  These  results  made  it  clear,  at  least 
for  the  highly  flexible  B  form,  that  it  was  necessary  to  improve 
the  modelling  of  the  environment  surrounding  DNA  and  our  first 
step  in  this  direction  is  described  in  the  following  section. 

(v)  FIESTA:  a  new  approach  to  modelling  solvent  effects 

The  basis  of  the  FIESTA  (Field  Integrated  Electrostatic 
Approach)  is  contained  in  the  classical  electrostatic  expression 
which  describes  the  work  necessary  to  assemble  a  charge 
distribution  within  a  dielectric  medium  in  terms  of  the  field  (E) 
and  the  dielectric  displacement  (5)  generated. 
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dw  =  ED/8  it  dt  =  £  E2/8  h  dt 

As  shown,  under  normal  conditions  where  neither  dielectric 
saturation  or  high  frequency  fields  are  involved,  5  may  be  written 
as  £E.  This  equation  can  be  used  to  calculate  the  electrostatic 
energy  of  a  system  of  charges  as  the  following  examples  show. 

EXAMPLE  1:  The  interaction  of  two  point  charges  and  q2  placed 
at  points  r^  and  r2  respectively  within  a  continuum  dielectric. 

The  field  at  an  arbitrary  point  r  can  be  written  as, 

E(r)  =  qx  (r-ri)  /  E^r-r^3  +  q2  (r-r2)  /  e(r-r2)3 
The  work  necessary  to  assemble  this  system  is  thus, 
w (rl , r2)  =  (1/2  e)  (q!2+q22)  J  dr/r2 

+  (qi^/s)  J  {(r-ri)  (r-r2)/[(r-r1)3(r-r2)3]}  dt 
=  divergent  self-energy  +  qiq2/er12 
which  as  we  can  see  is  simply  the  usual  Coulomb  expression  for  the 
energy  plus  a  divergent  term  associated  with  each  individual 
charge  that  can  be  ignored. 

EXAMPLE  2:  The  transfer  of  an  ion  of  radius  'a'  from  vacuum  to  a 
dielectric  medium. 

In  this  case,  we  can  write  the  work  necessary  for  the  transfer  as, 
w  =  w  -  w0  =  l/8u  Jdw  (ED-EqDq)  r2  dr 
where  E  and  D  refer  to  the  situation  within  the  dielectric  and  Eg 
and  D0  are  the  vacuum  values.  Since  the  dielectric  constant  is 
unity  in  vacuum  Dg  =  Eg.  Also  E  =  Eg/ e,  thus  D  =  Eg  and, 
w  =  q2/2  (1/e-l)  J  E02  r2  dr 
=  -1/2  (1-1/ e)  (q2/a) 

which  is  the  well  known  Born  formula  for  ionic  solvation. 

The  same  approach,  after  a  number  of  reasonable 
approximations,  can  be  extended  to  calculate  the  energy  of  a 
molecule  within  a  continuum  dielectric.  We  will  firstly  represent 
the  molecule  as  a  distribution  of  n  atomic  point  charges  q^ 
surrounded  by  an  envelope  formed  from  the  superposition  of  van  der 
Waals  atomic  spheres  (radii  a^) .  The  total  volume  inside  the 
envelope  is  divided  up  into  atomic  volumes  by  taking  into 
account  the  overlap  of  the  van  der  Waals  spheres  of  chemically 
bound  atoms.  These  volumes  are  termed  CPK  atomic  volumes  since 
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they  correspond  to  the  way  a  CPK  molecular  model  is  built  up.  It 
will  presently  be  assumed  that  the  dielectric  within  the  molecular 
envelope  is  unity.  If  we  now  assume  that  the  dielectric 
displacement  (or  field)  generated  by  the  charge  distribution  is 
not  affected  by  the  presence  of  the  dielectric  boundary  then  we 
can  write  the  electrostatic  potential  energy  of  this  system  as  a 
sum  of  three  terms: 

V(r1,r2,...,rn)  =  V0  +  Vi(r1,r2, . . . ,rn)  +  V2 (rlfr2, . . . ,rn) 

The  first  of  these  terms  is  a  zero  point  energy,  which  is  simply 
the  sum  of  the  Born  transfer  energies  for  each  individual  atom, 

V0  =  -1/2  (1-1/ e)  gi2/ai 

The  second  term  corresponds  to  the  Coulomb  interaction  energies  of 
the  point  charges  within  the  dielectric  medium, 

V1  =  qi<3j/(erij) 

and  the  third  term  corresponds  to  the  field  energy  associated  with 
the  vacuum  envelope  surrounding  the  charge  distribution, 

V2  =  fv/8-n  (1-1/ e)  /  E2  dv 

Note  that  this  term  contains  the  only  adjustable  parameter  of  the 
method,  fv  a  volume  factor,  which  is  used  to  correct  for  the  fact 
that  discrete  solvent  molecules  cannot  perfectly  fill  the  space 
outside  the  van  der  Waals  molecular  envelope. 

In  order  to  evaluate  the  field  integral  we  will  separate 
the  field  contributions  at  any  given  atom  i  into  the  field  1^ 
created  by  the  atom  itself,  the  field  created  by  atoms  b 
chemically  bound  to  i,  and  the  field  due  to  the  remaining 
atoms . 

Ei  =  Ii  +  Ji  +  Ki 

After  squaring  and  regrouping  terms  we  can  see  that, 
j£i2  =  X  f i2  +  / (2Ii  +  Ji) .5^  +  J (2Ii  +  2Ji  +  Ki) .Ki 
The  first  of  these  integrals  is  a  divergent  self-energy  which  may 
be  ignored  as  in  the  example  shown  above.  The  second  integral, 
which  we  will  term  Aj_  must  be  evaluated.  The  final  integral  can  be 
simplified  if  we  assume  that  the  field  from  the  more  distant  atoms 
Kj_  is  homogeneous  within  the  CPK  volume  of  atom  i.  In  this  case  we 
can  calculate  a  simple  volume  average, 

=  2  /(li  +  Ji)  dv  /  Ci 
and  rewrite  the  last  integral  as, 

Ci  (Fi  +  Ki) .Ki 

The  volume  integrals  Ai  and  Fi  unfortunately  cannot  be  solved 
analytically,  however  it  is  possible  to  separate  the  atomic 


charges  they  contain  to  generate  a  set  of  simpler  integrals, 

Ai  =  Sb<3b  f  2<3isib  +  Zb'9b''Tbb'3 
h  =  2  Wi  +  zb  %Uib) 

The  volume  integrals  S,T,Z  and  U  only  depend  on  the  geometrical 

positioning  of  the  neighbours  b  chemically  bound  to  atom  i. 

sib  =  J.  (r-rj.) .  (r-rb)  /  [  (r-ri) 3  (r-rb)  3]  dv 

Tbb/  =  f  (f-rb) . (?-rb/)  /  [ (r-rb) 3 (r-rb/ ) 3 ]  dv 

Zi  =  J  [(r-fi)  /  (r-ri)3]  dv  /  ci 

Uib  =  J  t(f-rb)  /  (r-rb)3]  dv  /  Ci 

These  integrals  can  be  calculated  numerically  and,  with  the  use  of 
standard  bond  lengths  and  angles,  may  be  built  up  into  an  integral 
data  base  for  each  type  of  atomic  center  that  will  be  required 
(defined  by  the  type  of  central  atom,  the  types  of  its  bound 
neighbours  and  their  spacial  positions) .  Using  these  standardized 
integrals  results  in  a  error  of  roughly  5%  compared  to  using  the 
exact  molecular  geometry.  Several  other  refinements  can  be  made  to 
this  methodology,  notably  a  slight  modification  of  the  interaction 
distance  r^j  used  to  evaluate  the  fields  of  distant  atoms  which 
improves  the  quality  of  the  volume  average  introduced  in  the 
integral  and  also  a  correction  to  allow  for  overlap  between 
unbound  atoms,  notably  those  forming  hydrogen  bonds.  It  is  also 
remarked  that  the  Born  energies  of  the  V0  term  may  be  made  over 
the  full  atomic  spheres  rather  than  the  CPK  volumes  Ci  by  a 
modification  of  the  volume  taken  into  account  in  the  Tbb> 
integrals  described  above. 

The  major  simplification  introduced  in  the  FIESTA  model 
involving  the  removal  of  the  dielectric  interface  effects  on  the 
dielectric  displacement  clearly  needs  to  be  justified  by  tests  on 
realistic  systems.  The  first  such  check  was  carried  out  for  the 
case  of  a  number  of  charges  contained  within  a  large  spherical 
cavity  (using  a  close  packed  array  of  small  spheres  to  mimic  a 
molecular  envelope) .  This  calculation  led  to  an  excellent 
agreement  with  the  exact  solution  obtained  by  Kirkwood  (ref.  16) . 
In  the  case  of  real  molecules  it  is  possible  to  consider  a  refined 
parameterisation  of  the  atomic  van  der  Waals  radii,  however  as  the 
results  in  table  3  show,  simply  using  Pauling  radii  and  a  volume 
factor  fv=l . 25  (with  our  usual  Huckel-Del  Re  charges  from  the 
"Flex"  parameterisation)  it  is  already  possible  to  obtain  an 
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excellent  agreement  with  the  experimental  hydration  enthalpies  for 
the  nucleic  acid  bases.  In  addition  the  FIESTA  value  for  dimethyl 
phosphate  hydration  (in  the  gt  conformation),  -92.8  Iccal/mole, 
falls  close  to  the  value  of  -85.4  kcal/mole  (ref.  17)  obtained  by 
a  more  refined  discrete  water  +  continuum  calculation  (which 
however  used  a  rather  different  charge  distribution  obtained  from 
CNDO  studies) . 

TABLE  3 

Comparison  of  FIESTA  and  experimental  hydration  enthalpies  (refs. 
18,19)  for  nucleic  acid'  subunits  (kcal/mole) 


l  w 

Molecule 

FIESTA 

Experimental 

t 

a 

Guanine 

-34.1 

-35.4  ±  2.6 

■* 

Adenine 

-24.7 

-22.6  ±  0.6 

Cytosine 

-29.2 

-29.5  ±  1.0 

<i 

Thymine 

-27.3 

-23.7  ±  0.7 

Overall,  it  would  appear  that  the  FIESTA  model  can  be 
used  to  greatly  improve  the  treatment  of  solvent  effects  within 
macromolecular  simulations.  It  can  be  shown  to  correctly  reproduce 
many  qualitative  features  of  the  dielectric  environment  (short 
range  repulsion  between  charges  of  opposite  sign,  correct 
asymptotic  behavior  of  the  apparent  dielectric  for  charges  buried 
within  a  molecule  or  close  to  the  solvent  interface,  forces  on 
buried  charges  leading  them  towards  the  solvent,  etc)  which  are 
absent  in  the  simple  dielectric  models  generally  employed  today. 
From  a  practical  point  of  view  it  is  important  to  note  that  the 
FIESTA  energy  and  its  analytic  derivatives  can  be  evaluated 
rapidly,  even  for  macromolecular  systems,  and  thus  can  easily  be 
incorporated  in  a  general  conformational  energy  optimisation 
procedure. 

CONCLUSIONS 

This  article  summarizes  the  steps  that  we  have  been  able 
to  take  towards  a  more  refined  modelling  of  the  properties  of  the 
nucleic  acids.  The  combined  use  of  internal  and  helicoidal 
coordinates  within  the  JUMNA  algorithm  form  an  excellent  basis  for 
investigating  both  fine  base  sequence  effects,  flexibility  and 
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large  conformational  changes  and  have  already  allowed  us  to  study 
a  number  of  biologically  interesting  problems.  Coupling  this 
approach  with  the  new  solvent  treatment,  FIESTA,  that  we  have 
described  should  now  make  it  possible  to  take  another  step  towards 
a  truly  realistic  simulation-  of  macromolecular  behaviour. 
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DISCUSSION 


PEPE  -  The  electric  field  around  ions  is  very  strong.  Did  you  take  into  account  the  effect 
of  the  electric  field  on  the  orientation  of  the  water  molecules  ? 

LAVERY  -  As  you  have  seen  the  FIESTA  model  treats  the  solvent  as  a  continuum.  In 
consequence,  no  allowance  for  the  effects  of  discrete  waters  close  to  the  solute  is 
made.  If  such  effects  turned  out  to  be  of  prime  importance  then  a  continuum  model 
would  undoubtedly  be  a  poor  choice.  It  is  thus  perhaps  encouraging  to  recall  that,  in 
the  case  of  the  dimethylphosphate  anion,  we  have  obtained  a  very  good  agreement 
with  a  more  refined  model  which  included  an  explicit  first  hydration  shell.  It  would 
seem  that,  in  this  case,  even  the  strong  fields  produced  by  a  net  charge  of  -1  can 
effectively  be  dealt  with  within  the  continuum  treatment.  One  of  our  reasons  for  starting 
with  this  approach  was  after  all  that  the  Born  model  for  ionic  hydration  works  so  well. 
We  should  also  remark  that  FIESTA,  despite  its  simplicity,  certainly  represents  an 
important  step  in  quality  beyond  simple  distance  dependent  dielectric  functions  which 
are  commonly  used  today. 


KARPLUS  -  Why  do  you  quote  enthalpies  of  hydration  for  your  model,  I  thought  this 
treatment  should  lead  to  free  energies  ? 

LAVERY  -  You  are  quite  correct,  in  principle,  the  FIESTA  model,  like  the  Born  ionic 
hydration  model,  leads  to  an  electrostatic  free  energy.  However,  since  the  entropy 
enters  this  simple  model  through  the  temperature  dependence  of  the  dielectric 
constant  we  did  not  feel  that  this  was  likely  to  yield  a  good  treatment  of  entropic  factors. 
We  therefore  chose  to  assume,  for  the  moment,  that  we  are  dealing  with  a  model 
dielectric  where  this  dependence  is  removed  and  fit  the  electrostatic  term  to  enthalpies 
alone.  This  is  also  more  coherent  with  the  enthalpic  contributions  which  we  calculate 
for  the  other  terms  in  our  energy  functional. 


SOUMPASIS  -  There  are  many  stereochemically  personable  propositions  for  the 
structure  of  B-Z  junctions.  But  in  the  absence  of  any  definitive  experiments  which  may 
discriminate  them,  that  they  are  a  priori  equiprobable  is  equally  plausible. 

However,  there  is  some  evidence  that  single  stranded  regions  are  very  probably 
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present  (4-S  bp's)  on  the  basis  of  enzyme  digestion  studies.  How  does  your  junction 
model  explain  this  fact  ? 

LAVERY  -  Our  model  not  only  suggests  a  stereochemically  and  energetically  feasible 
conformation  for  B-Z  junctions,  but  also  leads  to  a  possible  transition  pathway  which  is 
generally  more  difficult  to  find.  As  you  suggest  some  studies  indicate  strand  opening 
and  enhanced  chemical  reactivity  at  the  junctions  has  also  been  found.  It  is 
nevertheless  difficult  to  distinguish  between  a  true  open  state  and  a  simple  fragility  at 
the  junction  region.  Within  our  model,  the  lack  of  stacking  between  the  base  pairs  at 
the  interface  will  certainly  weaken  the  junction  zone  and  allow  easier  chemical  attack. 
What  certainly  cannot  be  explained,  as  yet,  is  the  mobility  and  the  extent  of  B-Z 
junctions  as  a  function  of  base  sequence,  but  we  hope  to  make  some  progress  in  this 
area  soon. 


ANGYAN  -  Do  you  plan  to  include  the  free  energy  of  cavitation  in  the  FIESTA  method  ? 

LAVERY  -  The  inclusion  of  a  cavitation  term  is  certainly  one  of  the  possibilities  that  we 
are  considering.  Cavitation  effects  are  clearly  not  present  in  the  current  formulation 
and  may  be  of  importance.  The  only  constraint  we  impose  upon  ourselves  in  making 
such  extensions  is  that  the  FIESTA  model  should  remain  simple  enough  for  rapid 
computation  in  the  case  of  macromolecular  systems. 
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SUMMARY 

The  conformational  stability  and  structural  transitions  of  polyionic 
biomolecules  (e.g.  DNA,  RNA,  polysacharides,  charged  proteins  etc.)  are 
strongly  affected  by  the  interactions  with  ions.  The  potential  of  mean 
force  (PMF)  framework  has  proved  to  be  a  computationally  feasible, 
general  and  often  surprisingly  accurate  technique  for  the  quantitative 
treatment  of  their  effects  caused  by  biomolecule-diffuse  ionic  cloud 
statistical  interactions.  The  main  theoretical  results  obtained  so  far  for 
DNA-simple  electrolyte  systems  are  briefly  summarized  and  discussed 
below. 

INTRODUCTION 

An  overwhelming  amount  of  experimental  results  have  established 
that  practically  all  levels  of  structural  organization  of  highly  charged 
molecules  such  as  the  nucleic  acids  are  strongly  affected  by  the  types 
and  concentrations  of  ionic  species  present  in  their  environment.  It  is 
impossible  to  review  this  material  here  but  since  I  will  focus  on  DNA  I 
refer  the  interested  reader  to  an  overview  of  these  phenomena  for  the 
case  of  DNA  including  many  original  references.  (1). 

One  of  the  most  dramatic  and  therefore  extensively  studied  and  well 
characterized  ionic  effect  on  DNA  structure  is  the  B-Z  transition, 
whereby  certain  sequences  under  a  variety  of  environmental  conditions 
undergo  a  drastic  conversion  from  the  usual  right-handed  form  to  a 
completely  different  left-handed  form.  The  physical  chemistry  (2)  and 
energetics  (3)  of  this  transition  have  been  extensively  reviewed,  but  at 
this  point  it  is  instructive  to  recall  some  of  the  experimental  data 
pertaining  to  "simple"  salt-induced  B-Z  transitions  of  linear  DNA 
molecules  in  the  absence  of  subtleties  such  as  cosolvents,  supercoiling 
etc.  Such  experimental  data  sets  not  only  provide  valuable  insights  and 
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guidance  to  the  theoretician  intending  to  model  ion-DNA  interactions  and 
their  effects  on  structural  stability,  but  they  also  provide  the  necessary 
benchmarks  for  assessing  the  usefulness  of  his  or  her  theoretical 
productions. 

Table  1  summarizes  the  critical  concentrations  (i.  e.  the  salt 
concentration  at  which  the  B  and  Z  form  are  equiprobable)  for  the 
canonical  d(C-G)  sequence  and  its  modification  d(m5C-G),  whereby  the  5 
position  of  C  is  methylated  in  various  ionic  environments. 

TABLE  1 

Critical  concentrations  c*  of  i, .  transitions  induced  by  various  salts 
(chlorides)3 


Sequence 

Counterion 

c*  (mM) 

Reference 

poly  d(C-G) 

Na+ 

2300 

(4,5  ) 

K+ 

2600 

(  5) 

Rb+ 

2900 

(  5) 

Cs+ 

4800 

(  5) 

Mg2+ 

660 

(  4) 

bZn2+ 

0.15 

(  6) 

bZn2t-Cysteine 

0.08 

(  6) 

bZn2,-tris/z  amino 
ethyl)  amine 

0.003 

(  6) 

poly  d(m5C-G) 

Na+ 

750 

(  7) 

Mg2+ 

0.5 

(  7) 

c[Co(NH3)6]3+ 

0.02 

(  7) 

a:  Transitions  at  room  temperature  except  for  the  Zn-complexes 


(35°  C) 

b:  In  the  presence  of  2mM  NaCI 
c:  In  the  presence  of  50  mM  NaCI 


It  is  clear  that  depending  on  the  ionic  species  the  critical 
concentrations  may  vary  extremely  (a  factor  of  1 06  for  the  case  Na+  and 
Zn2+tris-2-aminoethyl  amine!)  However  the  alkali-chlorides  (and  in  the 
case  of  d(C-G)  Mg2+too)  induce  the  transition  at  high  salt  concentrations 
(>1  M)  whereas  more  complex  cations  (and  in  the  case  of  d(m5C-G),  Mg2+ 
too)  induce  the  transition  at  very  low  concentrations  bearing 
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stochiometric  relations  to  DNA  bases. 

The  reason  for  this  behavior  is  that  in  the  former  case  the  transition 
is  driven  by  diffuse  ionic  cloud,  effects  (screening  and  statistical 
collisions)  whereas  in  the  latter,  one  has  differential  stabilization  of 
the  Z  form  due  to  specific  ionic,  binding  to  base  sites  (e.  g.  N7  of 
guanine). 

These  two  fundamentally  different,  general  modes  of  interaction  of 
DNA  with  ions  are  competitive  (i.  e.,  when  both  simple  and  binding 
cations  are  present)  and  are  observed  with  any  other  charged 
biomolecule  in  solution  as  well. 

Modelling  of  the  specific,  binding  mode  at  the  microscopic  level 
requires  binding  energies  (obtained  from  quantum  mechanical 
calculations),  quantitative  consideration  of  subtle  hydration- 
dehydration  processes,  and  treatment  of  many  body  effects  (since  the 
equilibrium  takes  place  in  a  dense  aqueus  charged  phase).  The  outlook 
for  coping  with  these  problems  is  currently  very  pessimistic.  The 
diffuse  cloud  mode  is  simpler  but  not  simple  enough  to  be 
microscopically  treated  using  standard  available  simulation  techniques 
(e.  g.  Molecular  Dynamics,  Monte  Carlo)  primarily  due  to  the 
combinatorial  explosion  encountered  in  applying  computer  simulations 
to  mixtures  (e.g.  DNA-water  ions),  difficulties  in  obtaining  free 
energies,  long  range  of  the  Coulomb  interaction,  etc. 

In  view  of  this  state  of  affairs  we  decided  to  abandon  the  currently 
fashionable  philosophy  of  entirely  computer  based  simulation 
techniques,  adopt  cruder  models  capturing  the  essential  features  of  the 
problem  only  and  use  semianalytic  approximations  well  embedded  in  the 
rigorous  theory  of  statistical  mechanics. 

The  main  product  of  this  strategy  is  a  viable  approximate  framework 
for  quantitative  modelling  of  ionic  diffuse  cloud  contributions  to  the 
total  conformational  free  energy  of  highly  charged  biomolecular 
structures  (8-12)  called  the  PMF  Approach  and  recently  reviewed  in  Ref. 
13. 


PMF  TREATMENT  OF  IONIC  EFFECTS 

We  currently  use  the  simplest  example  of  a  PMF  approach.  The  central 
idea  involved  is  the  replacement  of  the  bare  Coulomb  interactions  of 
charged  solvent  accessible  sites  on  a  biomolecule  (e.g.  phosphates  and 
electronegative  base  atoms  in  the  case  of  DNA)  by  pairwise  additive 
effective  interactions  (the  pmfs)  obtained  after  canonical  averaging  of 
all  other  degrees  of  freedom  (i.e.  water  and  other  ions). 
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In  general  such  pmf's  are  free  energies,  i.e.  depend  on  the 
thermodynamic  state  parameter  of  the  system  envisaged  (e.g.  salt 
concentrations,  temperature  etc.).  They  make  it  possible  to 
approximately  represent  solvent  contributions  to  conformational 
stability  in  a  very  computationally  convenient  and  transferable  manner, 
irrespective  of  the  geometrical  complexity  of  the  biomolecule.  The 
model  currently  adopted  for  the  solvent  is  the  well  known  Restricted 
Primitive  Model  (RPM)  of  electrolytes  (14)  picturing  water  as  a 
dielectric  continuum  and  the  hydrated  ions  as  charged  hard  spheres  with 
distance  of  closest  approach  a. 

The  RPM-pmf's  currently  in  use  are  obtained  from  statistical 
mechanical  approximations  such  as  the  semi-analytic  Exponential  Mean 
Spherical  Approximation  (EXP  -  MSA)  (15,16)  and  the  Hypernetted-Chain 
Approximation  (HNC)  which  involves  numerical  solution  of  non-linear 
integral  equations  and  yields  very  good  results  for  charged  liquids  (17). 

(More  details  and  references  can  be  found  in  Refs.8,9,11,12). 

Implementation  of  the  approach  involves  specification  of  the  one 
parameter  a  which  in  all  our  work  has  been  fixed  at  4.90  A  (see  Refs. 
8,9).  As  explained  in  Ref.  (5),  the  o's  of  other  alkali  chlorides  can  be 
obtained  without  further  adjustment  from  colligative  data  of  their 
aqueous  solutions.  This  yields  the  set  of  o's  to  be  used  in  RPM-PMF 
computations:  4.66  A  (KCI),  4.41  A  (RbCI)  and  3.77  A  (CsCI). 

The  remaining  intramolecular  energy  contributions  (e.g.  covalent 
bonding,  torsional  energies,  van  der  Waals  interactions,  etc.),  which  to 
very  good  approximation  do  not  depend  on  diffuse  ionic  cloud 
interactions,  can  be  represented  with  semi-empirical  force  fields 
routinely  used  to  model  biomolecules. 

In  recent  work  (11,12)  we  have  used  the  AMBER  force  field  developed 
by  the  USCF  group  (18). 


RESULTS  FOR  DNA 

Using  the  RPM-PMF  outlined  above  and  considering  just  the  phosphate 
-  phosphate  interactions  suffices  to  describe  quantitatively  not  only  the 
high  salt  (>1.5  M  NaCI)  (8,9)  but  also  the  low  salt  (0.1  -  1.5  M  NaCI)  (10) 
dependence  of  the  canonical  B-Z  transition  of  d(C-G)  helices.  With  this 
sequence,  inclusion  of  all  other  salt  independent  intramolecular 
interactions  (11,12)  does  not  change  the  results  in  any  significant  way 
due  to  cancellations  in  the  free  energy  balance  governing  the  B-Z 
isomerization. 

This  is  not  true  with  other  sequences  such  as  d(G-C)12  or  d(A-T)6 
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where  our  -AMBER-PMF  force  field  predicts  that  a  NaCI  induced  transition 
is  not  possible  due  to  too  big  intramolecular  energy  differences 
(between  the  B  and.  Z  conformations)  (work  in  preparation)  in  accordance 
with  the  experimental  findings. 

Further,  the  RPM-PMF  computation  for  B-Z  transitions  of  pr!y  d(C-G) 
induced  by  the  other  alkali  chlorides  (5)  reproduces  excellently  the 
experimental  data  of  ref.  (5).  (shown  in  table  1).  . 

Computation  of  the  full  harmonic  spectrum  and  normal  modes  of  d(C-  1 

G)3  in  B  and  Z  forms  in  a  wide,  range  of  NaCI  concentrations  (10.01  -  5.0  ■ 

M),  using  an  AMBER-PMF  .force  field  (11)  shows  that  the  low  frequency  ( 

vibrations  depend  on  salt  concentration  and  the  lowest  frequency  mode  f 

of  the  B  form  drastically  softens  precisely  in  the  salt  regime  where  the  j 

transition  takes  place.  This  behavior  suggests  a  soft  mode  mechanism  l 

for  the  B-Z  conversion  analogous  to  mode  softening  in  solid  state  \ 

physics  (e.g.  ferroelectrics).  f 

A  study  of  the  relative  stabilities  of  the  A,B,C,Z),Z„  and  alternating  B  I 

conformations  considering  phosphate  -  phosphate  interactions  only  (9)  | 

has  predicted  in  addition  to  the  B-Z,  case  a  salt  induced  B-A  transition  j 

at  1 .85  M  NaCI  and  a  B-Z„  transition  at  0.2  M  NaCI.  „ 

The  latter  is  not  possible  (with  d(C-G)  helices)  when  one  uses  the  f 

full  AMBER-PMF  due  to  large  intramolecular  energy  contributions,  but  j 

the  B-A  transition  is  again  obtained  for  d(G-C)12-  l 

Other  applications  of  the  PMF  approach  to  hairpin  -  B  duplex  S 

transitions  (19)  ,  all  B  to  mixed  B/Z  transitions  (20)  and  helix  -  coil 
transitions  (21)  have  been  discussed  in  Ref.  (13). 

In  addition,  the  approach  can  be  used  to  approximately  determine  the 
3-dim  ionic  distributions  around  DNA  (22).  f 

5 

j 

CONCLUDING  REMARKS  ; 

Real  progress  in  realistic  modelling  of  highly  charged  biomolecules  in 
solution  including  their  subtle  conformational  transitions  and  | 

interactions,  is  only  possible  if  we  somehow  manage  to  handle  the 
formidable  complexity  of  aqueous,  multicomponent,  inhomogeneous,  i 

charged  fluids.  I 

Practically  none  of  the  results  reported  in  this  work  could  have  been  1 

obtained  using  presently  widespread  simulation  techniques  (Monte  Carlo,  \ 

Molecular  Dynamics)  in  spite  of  the  availability  of  supercomputers.  f 

The  brute  force  approach  is  simply  not  brute  enough  to  cope  with  the  l 

very  brute  problems  we  are  interested  in.  | 

\ 

i 

5 

' 


602 


However,  we  have  provided  some  evidence  that  one  does  not  have  to 
keep  track  of  every  single  particle  in  the  world,  in  order  to  obtain 
approximate  albeit  very  useful  answers  to  relevant  questions  in 
biomolecular  modelling. 

Every  many  body  system,  no  matter  how  complex,  has  to  obey  the 
laws  of  large  numbers  and  the  averaging  taking  place  wipes  out  all  but 
the  most  relevant  facets  of  the  truly  microscopic  description.  Any 
approximate  theory  firmly  based  on  statistical  mechanics  and  taking 
into  consideration  these  most  relevant  features  of  the  problem  is  likely 
to  be  a  useful  one  and  can  be  often  refined  step  by  step. 

The  simplest  version- of  the  PMF  Approach  based  on  the  RPM 
description  of  the  solvent  and  the  Kirkwood  Superposition 
Approximation  (KSA)  is  currently  being  adapted  to  the  treatment  of  ionic 
mixtures  (for  application  to  specific  ionic  binding  problems)  and  the 
KSA  is  being  refined  through  inclusion  of  higher  order  correlations. 

At  the  solvent  modelling  level,  we  are  testing  some  structurally 
more  realistic  descriptions  and  will  present  first  results  in  the  near 
future. 
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DISCUSSION 


PULLMAN  -  1)  When  you  discuss  the  ionic  effects  or  conformations  you  take  into 

consideration  the  phosphate  backbone  of  DNA  only.  While  this  could  be  a  good 
approximation  for  Na+,  Mg++  etc,  one  should  be  more  careful  about  Cs+  because  we 
know  from  the  experimental  results  of  Skuratowski  that  this  ion  sits  in  the  groves  of 
DNA  essentially  and  not  on  the  phosphates.  I  know  why  it  does  so  but  this  is  a 
secondary  problem. 

2)  You  indicate  that  Poly(dA).  Poiy(dT)  is  stable  only  in  the  B  form. 
This  could  induce  some  people  to  consider  this  polymer  as  a  typical  representative  of 
misform,  which  it  is  not.  Poly(dA).Poly(dT)  has  a  very  particular  structure  which  sets  it 
somewhat  apart. 

SOUMPASIS  -  1)  I  think  that  the  well  localized  chain  of  Cs+  ions  Skuratowski  et  al 
have  reported  in  their  X-rays  DNA  fiber  studies  will  not  survive  the  process  of 
dissolution.  If  it  were  present  in  solution  it  would  give  rise  to  physicochemical  behavior 
completely  different  from  the  other  alkali  cations  which  is  not  the  case  as  far  as  I  know. 

2)  It  is  true  that  Poly[(dA.dT)]  is  a  special  member  of  the  B  family  not 
identical  to  average  B.  I  just  wanted  to  emphasize  the  fact  that  one  cannot  drive  it  away 
from  the  B  family  under  conditions  where  other  members  become  A,  Z,  etc. 


LAVERY  -  There  are  two  very  interesting  cases  where  it  would  be  useful  to  know  about 
ionic  stabilization  effects  : 

i)  curved  DNA 

ii)  DNA  with  one  face  neutralized  to  model  protein  binding 

Can  PMF  theory  be  extended  to  model  these  asymmetric  situations  ? 

SOUMPASIS  -  Yes,  provided  you  know  the  positions  of  the  charges  PMF  theory  will 
always  yield  an  estimate  of  the  stabilization  for  energy  due  to  the  diffuse  ionic  cloud 
irrespective  of  the  geometrical  complexity  of  the  structures  involved. 


WIPFF  -  Concerning  the  induced  B  ->  Z  conversion  of  d(C-G)  sequences,  you  show 
that  Zn  (2  aminoethyl)  amines  complex  is  very  efficient  for  =  3  mM. 

How  does  that  complex  interacts  with  DNA  ? 


S0UMPAS1S  -  It  interacts  via  specific  binding  of  .Zn  presumably  to  NZ  of  guanime. 
Further  stabilization  by  H-bonding  to  nearby  DNA  atoms  is  also  very  likely. 


DYMEK  -  You  have  experimental  data  on  critical  concentrations  of  sodium  and 
magnesium  for  B-Z  transition,  but  only  for  sodium  is  the  free  energy  difference 
calculated.  Why  is  it  not  done  for  magnesium  ? 

SOUMPASIS  -  We  have  not  assigned  a  distance  of  closest  approach  o  for  MgCI2  yet 
and  in  addition  detailed  free  energy  experimental  data  were  not  available  until  very 
recently.  The  Mg  case  will  be  treated  in  a  future  publication.  It  one  assumes  c  =  5,0  A 
one  obtains  a  transition  at  1.0  M  MgCi2  (experimental  value  0.66  M  MgCI2)  (D.M. 
Soumpasis,  Proc.  Natl.  Acad.  Sci.  USA  SI,  5116-5120  (1984). 


HUIGE  -  To  what  extent  is  the  precise  form  of  the  PMF  potential  important  for 
calculations  the  B-Z  transition  ? 

SOUMPASIS  -  It  is  absolutely  essential  to  describe  the  many  body  effects  of  short 
distance  repulsion  (e.g.  hard  spheres  within  the  RPM)  as  weli  as  possible.  Treatment 
of  the  Coulomb  interactions  (e.g.  ions  modelled  as  points)  alone  does  not  yield  a  B-Z 
transition  in  the  experimental  regime  of  monovalent  salt  concentrations  (>  1M). 


RULLMANN  -  To  what  extent  is  the  PMF  method  applicable  to  charge-charge 
interactions  in  general  ? 

SOUMPASIS  -  The  general  philosophy  of  the  PFM  method  applies  to  any 
charge-charge  interaction  in  a  many  body  system  whatever.  The  pair  PMF  is  simply  the 
effective  pair  interaction  of  any  two  charges  at  given  fixed  positions  when  all  other 
degrees  of  freedom  have  been  statistically  averaged.  However,  the  accuracy  of  the 
number  a  you  get  will  of  course  depend  on  both  the  structural  model  of  the  system  and 
the  approximations  you  use  to  do  the  averaging. 
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OHLENBUSCH  -  Would  you  be  able  to  take  the  coordinates  of  my  chromatin  subunit 
model  to  make  estimates  about  its  stability. 

SOUMPAS1S  -  Yes. 


- 
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SUMMARY 

Hydration  shells  in  major  and  minor  grooves  of  double  helices  with  vario¬ 
us  nucleotide  sequence  and  conformation  have  been  studied.  Monte-Carlo  simu¬ 
lation  of  the  systems  containing  a  stack  of  complementary  base  pairs  and  30 
water  molecules  per  pair  has  been  performed.  Characteristic  features  of  the 
hydration  shell  structure  has  been  found  for  the  stacks  of  repeating  A:T  and 
G:C  base  pairs  as  well  as  alternating  (A:T,  T:A)  and  (G:C,  C:G)  ones.  Prese¬ 
nce  of  common  features  in  arrangement  of  hydrophilic  centres  common  for  va- 
various  sequences  and  configurations  is  manifested  in  the  similar  structural 
elements  of  the  hydration  shells.  Probabilities  of  the  formation  of  bridges, 
formed  by  1,2  and  3  water  molecules,  between  hydrophilic  centres  of  the 
bases  have  been  estimated.  Hydration  shell  structure  was  found  to  depend 
significantly  on  the  stack  sequence  and  configuration,  while  global  hydra¬ 
tion  characteristics  are  only  slightly  dependent  on  the  nature  of  the  stack. 
For  the  stacks  in  A  configuration  the  number  of  water  molecules  forming  more 
than  one  H-bonds  with  the  bases  is  greater  in  comparison  with  the  stacks  in 
B-like  configuration.  This  result  is  discussed  in  connection  with  the 
concept  of  hydration  economy  during  B  to  A  transition. 

INTRODUCTION 

The  mutual  influence  of  the  space  organization  of  the  polynucleotide 
double  helix  and  the  structure  of  its  environment  on  each  other  can  be  con¬ 
sidered  as  determined  experimentally.  One  of  the  most  important  manifesta¬ 
tions  of  the  properties  of  the  polynucleotide  fragments  with  different  se¬ 
quence  and  conformation  is  difference  in  their  hydration.  Thus,  double  layer 
water  spine  is  characteristic  for  the  central  part  of  d(CGOGAATTCGOG)  do- 
decamer  (ref.  1),  while  in  the  case  the  d(CCAAGATTGG)  decamer  crystal  (ref. 
2)  and  that  of  modified  d(GOGOGC)  hexamer  (ref.  3)  water  bridges  ("springs") 
between  N3  or  02  base  atoms  and  sugar  ring  oxygen  of  the  neighbor  nucleotide 
are  typical  elements  of  the  hydration  shell  structure.  These  differences  can 
be  reflected  in  specific  interactions  of  DNA  with  proteins  and  other  molec¬ 
ules  of  biological  importance.  Regular  construction  from  water  molecules 
in  the  vicinity  of  DNA  surface  can  be  replaced  by  the  other  molecules.  Thus 
netropsin  can,  according  to  X-ray  data,  replace  double  layer  water  spine  in 
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minor  groove  of  double  stranded  dodecamer  d ( CG03MTTCGCG )  (ref.  4). 

Interactions  with  water  molecules  can  give  rise  to  conformational  peculi¬ 
arities  in  certain  parts  of  DNA,  which  are  important  for  biological  functio¬ 
ning  of  the  biopolymer.  Computer  simulation  is  one  of  the  most  powerful 
tools  in  the  study  of  biopolymer-water  interaction.  This  simulation  can 
provide  us  information  which  can  not  be  obtained  by  conventional  experi¬ 
mental  methods.  The  study  of  polynucleotide  -  water  systems  by  computer 
simulation  are  successfully  performed  during  last  decade  in  several  labora¬ 
tories  and  the  interesting  results  on  hydration  shell  structure  of  DNA,  its 
components  and  complexes  have  been  already  obtained  (ref.  5-10). 

Our  group  is  engaged  in  a  systematic  investigation  of  the  hydration  of 
nucleic  acid  fragments  using  Monte-Carlo  simulation.  Earlier  we  have 
developed  methods  of  calculating  energy  in  related  systems  (ref.  7), 
simulated  hydration  of  bases  and  complementary  pairs  (ref.  8),  as  well  as 
performed  the  simulation  of  hydration  shells  of  minor  and  major  grooves  of 
poly(dA)  poly(dT)  in  the  two  different  B  confirmations  (ref.  10). 

Here  we  present  some  results  of  simulation  of  the  hydration  of  the  stacks 
formed  by  complementary  base  pairs  with  different  nucleotide  sequences: 
(A:(J)n,  (G:C)n,  (A:U,U:A)n>  (G:C,C:G)n.  We  consider  the  dependence  in  hydra¬ 
tion  of  certain  hydrophilic  centres  on  the  stack  configuration  and  sequence, 
determine  probability  of  water  bridges  between  base  atoms,  elucidate  and 
specify  the  most  typical  fragments  of  the  hydration  shell.  The  simulations 
showed  that  subtle  peculiarities  of  hydration  shell  structure  but  not  the 
global  characteristics  strongly  depend  on  sequence  and  configuration  of  the 
stack.  Some  preliminary  results  of  this  work  was  included  in  early 
publication  (ref.  9). 

METHODS  OF  CALCULATIONS 

All  the  simulated  systems  contained  a  stack  of  six  base  pairs  arranged  as 
in  a  certain  conformation  of  double  helix  and  180  water  molecules.  N1  pyri¬ 
midine  and  N9  purine  atoms  were  methylated. Helical  periodic  boundary  condi¬ 
tions  were  imposed  along  the  helical  axis  at  the  stack.  Distance  between 
any  water  molecule  and  centre  of  at  least  of  one  of  the  bases  was  not  per¬ 
mitted  to  be  greater  than  9.5A. 

Mutual  arrangement  of  the  bases  which  in  general  features  resemble  those 
found  in  A-  and  B-forms  of  DNA  were  considered  for  each  sequence.  For  repea¬ 
ting  AU  stack  additional  B' -conformation  resembling  that  in  the  central  part 
of  d(OGCGAATTCGOG)  dodecamer (ref.  1)  was  considered.  For  other  sequences 
such  base  arrangement  are  extremely  unfavorable  energetically. 

Specially  deviced  for  such  systems  potential  functions (ref.  7,12)  have 
been  used  for  calculating  interaction  energy.  1-6-12  type  potential  func- 
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tions  are  used  to  calculate  interactions -between  all  the  atoms,  except  tho¬ 
se,  involved  in  H-bonding.  In  the  latter  case  1-10-12  functions  are  used. 
Geometrical  criterion  was  used  for  the  determination  of  the  H-bonds:  A-H. ..B 
configuration  was  called  ah  H-bond  if  A. . .B  distance  was  less  than  3.2A  and 
H. . .B  distance  was  less  than  2.4A,  where  A  and  B  are  0  or  N  atoms. 

Average  energetical  and  structural  hydration  shell  characteristics  were 
calculated  from  a  statistically  significant  sampling  of  configurations  ob¬ 
tained  by  a  Metropolis  et  al.  algorithm  (ref.  13)..  Each  of  the  states, 
accepted  by  a  Metropolis  procedure  corresponds  to  an  instantaneous  (I) 
structure  of  the  hydration  shell.  Along  with  analyses  of  the  ensemble  of  the 
I  structures  we  obtained  "frozen"  (F)  structures  (14)  by  lowering  the  tem¬ 
perature  of  simulation  up  to  5K.  Positions  of  water  molecules  In  F  struc¬ 
tures  are  close  to  those,  corresponding  to  local  potential  energy  minima. 

g 

For  each  system  studied  number  of  obtained  I  structures  was  about  2-10  . 

RESULTS  AND  DISCUSSION 
General_regularities . 

Total  characteristics  of  the  hydration  shells  of  the  stacks  are  shown 
in  the  Table  1 . 


TABLE  1 

Energy  and  structure  characteristics  of  hydration  shells  of  base  pair  stacks 


sequence 

(A:U)n 

(A:U,U:A)n 

(G:C, 

,C:G)n 

(G: 

:C)n 

configuration  A 

B 

B' 

A 

B 

A 

B 

A 

B 

energy  values3 

total 

-906 

-905 

-901 

-887 

-896 

-945 

-953 

-970 

-982 

water-water 

-7.16 

-7.26 

-7.26 

-6.99 

-7.20 

-6.91 

-7.08 

-7.03 

-7.12 

water-pur 

-23.0 

-22.0 

-22.5 

-22.9 

-22.0 

-30.8 

-30.2 

-34.4 

-33.6 

water-pyr 

-20.7 

-19.8 

-18.8 

-20.1 

-19.4 

-23.1 

-22.4 

-21.8 

-23.1 

a 

total  numbers  of  H-bonds 

water-water 

1.75 

1.76 

1.75 

1.67 

1.71 

1.68 

1.70 

1.74 

1.76 

water-pur 

3.10 

3.05 

3.20 

3.14 

2.82 

4.61 

4.68 

4.37 

4.38 

water-pyr 

2.39 

2.38 

1.94 

2.44 

2.16 

2.24 

1.95 

1.79 

1.84 

numbers  of  water  molecules 

H-bonded  to 

n 

base  atoms 

N7 

1.40 

1.51 

1.59 

1.41 

1.11 

)  .60 

1.58 

1.37 

1.36 

04,06 

1.38 

1.23 

1.20 

1.44 

1.09 

1.04 

1.16 

1.00 

1.06 

N6-H.N4-H 

0.80 

0.91 

0.87 

0.83 

0.73 

0.93 

0.90 

0.68 

0.81 

N3 

0.88 

0.63 

0.74 

0.90 

0.97 

1.08 

0.99 

1.07 

0.98 

02 

1.01 

1.15 

0.74 

0.99 

1.07 

1.30 

1.04 

1.07 

1.02 

N2-H 

0.90 

0.96 

0.92 

0.98 

^otal  energy  refers  to  six  base  pairs  and  180  water  molecules;  the  water- 
water  and  water-base  energy  values  and  H-bond  numbers  refer  to  one  water 
molecule  and  one  base  respectively. 
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Average  total  potential  energy  of  the  systems  both  for  A  and.B  stacks  are 
lower  for  G:C-containing  stacks  than  for  A:U  ones  as  it  was  the  case  for 
separate  base  pair  -  water  systems  (ref.  8).  Oar  simulations  show,  that 
total  hydration  characteristics,  such  as  energy  and  its  components,  number 
of  water-water  and  water-base-  H-bonds  only  slightly  depend  on  the  stack 
sequence  and  configuration.  At  the  same  time,  subtle  peculiarities  of  the 
hydration  shell  structure  in  both  grooves  of  the  double  helix  change 
essentially  when  stack  configuration  and  (or)  sequence  is  transformed.  This 
change  of  hydration  shell  structure  can  be  most  vividly  demonstrated  if  we 
consider  the  pattern  of  water  bridges  formed  between  hydrophilic  centres  of 
the  bases.  Common  for  various  sequences  and  configurations  features  in 
arrangement  of  hydrophilic  centres  (ref.  11)  lead  to  common  structural 
elements  of  hydration  shells. 

Three  types  of  water  bridges  are  taken  into  consideration:  those  formed 
by  one  (Bl),  two  (B2)  and  three  (B3)  water  molecules. Situation,  correspon¬ 
ding  to  Bl  bridge,  that  is  when  a  water  molecule  forms  H-bonds  with  two 
(sometimes  three)  hydrophilic  centres  simultaneously , corresponds  almost  to 
all  potential  energy  minima  in  system  base  +  1  water  molecule  (ref.  7). 
Arrangements,  corresponding  to  B2  bridges  (when  each  of  the  two  water  mole¬ 
cules  form  an  H-bond  between  each  other  and  with  a  hydrophilic  centre)  are 
typical  for  potential  minima  in  the  systems  base  +  2  water  molecules  (ref. 
7).  In  B3  bridge  two  water  molecules  form  H-bond  with  hydrophilic  centres 
and  with  a  third  molecule,  which  is  not  H-bonded  to  bases. 

Several  water  bridges  may  connect  the  same  pair  of  hydrophilic  centres  in 
some  F  and  I  structures.  The  Bl  bridges  are  formed  with  high  probability  if 
N. . .N,  N. . .0  or  0. . .0  distance  between  the  centres  is  close  to  4.3A. 
Probability  of  bridge  formation  between  bases,  belonging  to  the  same  pair 
is  less  in  stacks  than  in  separate  base  pairs  (ref.  8).  This  is  due  to 
formation  of  a  space  H-  bonded  network  comprising  the  stack  in  which  many 
bridges  between  centres  belonging  to  different  base-pairs  are  formed.  Such 
bridges  make  contribution  to  the  stabilisation  of  the  stack,  difference  in 
probabilities  of  their  formation  manifesting  to  different  degree  of 
stabilisation  of  various  DNA  conformation  by  aqueous  environment.  Let  us 
consider  some  water  bridges  in  more  detail  and  describe  characteristic 
elements  of  the  hydration  shell  structure  in  the  minor  groove  of  the  double 
helix.  These  elements  can  be  defined  when  considering  F  structures  (Fig.  1 
is  an  example) .  Only  those  water  molecules  are  shown  in  this  figure  which 
participate  in  formation  of  H-bonds  with  the  bases  or  of  water  bridge. 
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^§ter_bridges_and_hydration_6hell_structure_in_minor_groove. 

Total  probability  of  B1  bridge  formation  in  this  groove  for  B  stacks  is 
rather  low  (less  than  16%).  Location  of  water  molecules,  forming  such 


Fig.  1.  Water-base  hydrogen  bonds  and  water  bridges  in  the  minor  groove  of 
alternating  G:C  stack.  Stereo  view  of  F-Btructure  of  hydration  shell  for  A 
(top)  and  B(bottom)  configuration. 

bridges  in  A:U  stacks  is  strikingly  different  from  that  characteristic  for 
G:C  stacks,  which  is  connected  with  the  presence  of  an  additional  hydro¬ 
philic  centre  -  amino  group.  Bridges  with  participation  of  this  group  most 
frequently  are  formed  according  to  scheme 
A  .  NH„ .  A 

’*  W  ‘  W  . 

A  ‘ '  NH2  "A 
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where  A  are  H-bond  acceptors  (guanine  N3  or  cytosine  02),  and  W  are  water 
molecules.  B1  bridges  between  uracil  carbonyl  oxygens  of  the  adjacent  pairs 
are  most  probable  in  both  types  of  A:U  stacks:  repeating  (8%)  and 
alternating  (11%). 

B2  bridges  between  atoms  accepting  H-bonds  are  rather  rare  for  both  re¬ 
peating  B  stacks;  as  to  alternating  ones,  probability  of  such  bridges, 
connecting  bases  of  the  adjacent  pairs,  is  much  higher  (e.g.  between  02(1) 
and  02(2)  this  probability  is  42%  for  A:U  stack  and  20%  for  G:C  stack).  M2 
bridge  connecting  guanine  NH2  group  with  some  other  centre  is  still  more 
probable  in  both  types  of  G:C  containing  stacks.  In  this  case  pattern  of 
H-bonded  bridges  can  be  represented  schematically  : 

A.. .  NH2. . A 

A . .'  '.'NHg . I  :A 

Probability  of  B3  bridges  between  acceptors  of  H-bonds  in  B-stacks  is 
somewhat  higher  than  of  B2  bridges.  Along  with  N3(l). . .0(2)  and  02(1). . . 
02(2)  bridges  N3(l) . . .02(3)  bridges  form,  which  connect  not  the  adjacent, 
but  the  first  and  the  third  pairs.  Probability  of  such  bridges  are  higher 
for  alternating  G:C  stacks  than  for  A:U  ones.  Stabilization  of  such  bridges 
in  G:C  stacks  is  due  to  formation  of  additional  H-bond  between  central  water 
molecule  of  the  bridge  with  guanine  NH2  group  (see  Fig.l). 

Total  number  of  H-bonds  between  hydrophobic  centres  of  the  grooves  and 
water  molecules  (NHB)  are  given  in  Table  2,  as  well  as  number  of  water 
molecules,  participating  in  such  H-bonds  (NM).  The  latter  number  is  always 
less  then  the  former  for  there  are  bridges  in  which  one  water  molecule  forms 
more  then  one  H-bond  with  the  bases.  In  minor  groove  of  all  the  G:C  stacks 
NAB  and  NM  are  greater  than  corresponding  characteristics  of  A:U  stacks. 
Difference  of  these  numbers  for  major  groove  of  A:U  and  G:C  stacks  is  not 
significant.  Such  a  behaviour  is  connected  with  the  presence  of  an  addi¬ 
tional  hydrophilic  centre  (NH2  group)  in  minor  groove  of  the  G:C  stacks. 

NM/NHB  ratio  characterizes  the  degree  of  "water  economy"  in  the  grooves. 
This  ratio  is  less  (water  economy  is  greater)  in  major  groove  than  in  minor 
one  for  all  the  studied  stacks.  "Hydration  economy"  concept  was  proposed  by 
Saenger  and  co-worker's  (ref.  15)  to  explain  the  mechanism  of  B  to  A  transi¬ 
tion.  As  it  was  noted  (ref.  15)  two  oxygens  from  neighbour  phosphate  groups 
can  be  connected  in  A  form  by  one  water  molecule  (B1  bridge  according  to  our 
denotation).  Distance  between  these  atoms  being  greater  in  B-form,  formation 
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of  such  a  bridge  becomes  impossible,  and  at  least  two  water  molecules  are 
necessary  to  hydrate  phosphate  group  of  B  DNA. 

In  our  model  system  there  are  no  phosphate  groups,  but  still  water 
economy  is  observed  if  we  compare  structure  of  hydration  shells  of  stacks  in 
A  and  B  configurations.  It  is  manifested. in  decrease  of  NM/NHB  at  B  to  A 
transition.  On  the  F  structure  level  this  trend  can  be  seen  for  all  the 
sequences  studied.  For  I-structure  ensemble  this  is  not  the  case  for 
repeating  G:C  stack.  Ability  of  poly(dG) :poly(dC)  to  convert  easily  to 
A-form  can  be  explained  by  the  interaction  of  the  adjacent  G:C  base  pairs 
which  is  more  favorable  in  A-conformation  (ref.  11). 

TABLE  2 

Characteristics  of  hydration  shells  of  double  helix  grooves3 
configuration  A  B 


groove 

1 

2 

1+2 

1 

2 

1+2 

(A:U) 

11.36 

21.11 

32.47 

10.69 

21.86 

32.55 

NHB 

(A:U,U:A) 

11.35 

22.13 

33.48 

12.21 

17.66 

29.87 

(G:C,C:G) 

19.65 

21.43 

41.08 

17.93 

21.88 

39.81 

(G:C) 

18.36 

18.55 

36.91 

17.89 

19.41 

37.30 

(A:U) 

9.0 

14.2 

23.2 

10.2 

17.4 

27.6 

NW 

(A:U,U:A) 

10.4 

15.2 

25.6 

11.6 

13.6 

25.2 

(G;C,C;G) 

16.8 

16.4 

33.2 

16.6 

16.1 

32.7 

(G:C) 

17.3 

15.1 

32.4 

15.5 

15.1 

30.6 

(A:U) 

0.79 

0.67 

0.72 

0.95 

0.79 

0.85 

NW/NHB 

(A:U,U:A) 

0.92 

0.68 

0.76 

0.95 

0.77 

0.84 

(G:C,C:G) 

0.86 

0.76 

0.81 

0.93 

0.74 

0.82 

(G:C) 

0.94 

0.81 

0.88 

0.87 

0.78 

0.82 

aNHB  -  the  number  of  water- base  H-bonds,  NM  -  the  number  of  water  molecules 
forming  H-bonds  with  bases.  All  values  refer  to  six  base  pairs.  Column  1 
corresponds  to  minor  groove,  2  -  major. 


Though  our  calculations  do  not  take  into  account  difference  in  A-  and  B- 
configurations  for  various  sequences  nor  hydration  of  sugar-phosphate 
backbone,  qualitative  results  correctly  reflect  the  trend  of  the  bases  to 
arrange  in  such  a  way,  that  decrease  of  water  in  the  system  will  result  in 
decrease  of  number  of  water  molecule  necessary  to  hydrate  all  the 
hydrophilic  centres. 


For  correct  description  of  the  DNA-water  systems,  contribution  of  sugar 
phosphate  backbone  and  of  counter  ions  should  be  also  taken  into  account. 
Unfortunately,  our  computer  resources  are  not  sufficient  to  simulate  such 
a  system,  for  it  should  contain  much  more  water  molecules.  These 


614 


contributions  may  change  pattern  of  the  base  hydration  in  the  grooves.  But 
this  change  should  be  rather  small  if  the  groove  is  wide  enough.  This  is  the 
case  for  all  configuration  considered  except  the  minor  groove  of  B'form.  The 
minor  groove  of  B' -conformation  is  narrowed,  and  its  pattern  of  hydration 
differs  significantly  for  base  pair  stack  from  that  of  B'-DNA  helix.  But 
our  results  help  to  emphasize  the  role  of  base  arrangement  in  hydration 
shell  structure  in  this  case  as  well. 
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DISCUSSION 


OHLENBUSCH  -  You  have  considered  DNA  to  be  a  straight  rod,  but  in  solution  DNA  is 
generally  bent  so  that  grooves  vary  in  size  considerably  depending  upon  whether  they 
are  on  the  inside  or  outside  of  the  bend.  Have  you  considered  this  effect  in  your 
calculations  ? 

POLTEV  -  We  have  not  carried  out  calculations  for  bent  duplexes  as  nucleotide 
sequences  considered  have  no  tendency  to  be  bent.  Bending  takes  place  on  the 
boundary  between  different  conformations  ;  we  considered  only  regular  helices  yet. 


MAROUN  -  You  define  a  "water  economy"  factor  and  calculated  it  for  A  and  B-DNA 
trying  to  show  support  for  the  water  economy  idea  of  Sanger  et  al.  Did  you  calculate 
this  factor  for  Z-DNA  then  ? 

POLTEV  -  No  calculations  for  Z  conformation  were  performed  by  us  till  now.  I  think  we 
can  try  to  estimate  the  characteristics  of  base  hydration  for  this  form  also. 


GHOMI  -  What  are  the  main  structural  characteristic  of  the  B’  conformation,  especially 
for  homopurine-homopyrimidine  double-helices  ? 

POLTEV  -  The  main  structural  characteristic  of  B'  conformation  is  narrowed  minor 
groove.  We  performed  calculations  for  configuration  with  negative  tilt  (~10°)  and  nearly 
zero  propeller  twist.  Other  low  energy  B'  conformations  can  have  large  propeller  twist 
and  nearly  zero  tilt.  We  hope  the  main  hydration  features  for  various  B'  configurations 
will  be  rather  similar. 


WIPFF  -  Concerning  the  polymorphism  of  DNA  as  a  function  of  humidity  :  changing  the 
humidity  charges  also  the  concentration  of  cations  near  the  phosphate  in  the  groove. 
What  qualitative  feature  in  hydration  pattern  would  you  expect  from  the  explicit 
consideration  of  these  ions  in  the  simulation  ?  Cation  obviously  give  stronger 
interactions  with  water  than  the  bases. 


POLTEV  -  The  explicit  consideration  of  ions  in  the  simulation  will  result  in  changing  the 
global  but  not  the  relative,  for  various  sequences,  characteristics  of  duplex  hydration. 
We  hope  that  the  differences  in  hydration  patterns  for  various  sequences  will  be  similar 
to  those  obtained  without  taking  ions  into  consideration  (except  the  helices  with 
narrowed  groove). 
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THEORICAL  PREDICTION  OF  BASE  PAIR  SEQUENCE  EFFECTS  IN  Z-DNA  AND 
Z-RNA 


B.  HARTMANN  and  R.  LAVERY 

Institut  de  Biologie  Physico-Chimique,  13  rue  Pierre  et  Marie 
Curie, Paris  75005  (France) 

SUMMARY 

Base  sequence  effects  within  double  stranded  DNA  and  RNA 
oligomers  in  the  Z  conformation  have  been  studied  by  molecular 
modeling  using  a  methodological  approach  specifically  adapted  to 
nucleic  acids.  Calculations  on  symmetric  oligomers  having 

homonucleotide  or  dinucleotide  repeating  base  sequences  show  that 
sequence  changes  can  produce  modifications  in  overall  conformation 
and  strongly  affect  stability.  Within  the  z  family,  it  is 

demonstrated  that  certain  sequences  can  adopt  more  than  one 
polymorphic  form.  Enthalpies  of  transition,  B-Z  for  DNA  and  A-Z 
for  RNA,  are  be  calculated  as  a  function  of  base  sequence. 

INTRODUCTION 

In  contrast  to  the  situation  for  B-DNA,  although  the  role  of 
left-handed  Z-DNA  has  been  implicated  in  several  biological 
systems  (1,2,3),  little  is  known  about  the  influence  of  base 

sequence  for  this  conformation  other  than  the  fact  that  such 
effects  exist  (4,5,6).  Likewise,  it  has  been  recently  confirmed 

that  double  stranded  RNA  can  exist  in  the  Z  form,  but  only  a 
single  crystal  BrGCBrGC  showing  this  conformation  has  been 
reported  (7) . 

These  facts  have  encouraged  us  to  undertake  a  general 
molecular  modeling  project  aimed  at  understanding  the 
conformational  and  energetic  details  of  Z  forms  of  DNA  and  RNA  as 
a  function  of  base  sequence. 

METHODOLOGY 

The  calculations  presented  have  been  performed  with  the  Jumna 
algorithm  which  was  specifically  designed  for  energy  minimisation 
of  nucleic  acid  oligomers.  This  methodology  and  the  energy 
formulation  are  fully  described  elsewhere  (8,9). 

All  the  calculations  reported  refer  to  8  or  12  base  pair 
oligomers  o.<  DNA  and  RNA,  respectively.  We  have  studied  Z 
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conformations  with  all  possible  homonucleotide  and  dinucleotide 
repeating  sequences.  In  consequence,  we  have  been  able  to  impose 
dinucleotide  symmetry  on  each  oligomer  treated.  Exact  symmetry  is 
maintained  within  each  strand,  but  heteronomous  differences  are 
allowed  to  develope  between  the  strands  of  the  duplex.  We 
calculate  the  "environmental"  energy  (Eenv)  of  the  central 
dinucleotide  as  follows, 

Eenv  =  Ec  +  1/2  Ej>c  Ec j , 

where  Ec  is  the  internal  energy  of  the  central  dinucleotide  pair 
and  Ecj  is  the  interaction  energy  between  this  pair  and  the  other 
dinucleotide  pairs  forming  the  oligomer.  This  has  the  advantage  of 
both  reducing  the  computational  time  for  energy  minimisation  and 
also  of  eliminating  end  effects. 

In  order  to  represent  the  electrostatic  damping  associated 
with  aqueous  solution  and  the  presence  of  a  counterion  atmosphere, 
we  reduce  the  net  charge  on  each  phosphate  group  to  -0.5  and  we 
employ  a  distance  dependent  dielectric  function  (10) .  This 
function  has  a  slope  S=0.356  corresponding  to  strong  damping 
conditions  (Z  conformation) . 

Due  to  the  alternation  of  syn  and  anti  nucleotides  in  the 
backbones  of  Z  conformation,  10  oligomers  are  necessary  to  cover 
all  the  unique  homonucleotide  and  dinucleotide  sequences.  The 
first  base  of  these  dimers  is  always  a  syn  nucleotide  while  the 
second  is  always  anti.  The  number  of  errors  listed  refers  to  the 
number  of  times  the  usual  correlation  syn-purine/anti-pyrimidine 
is  violated  for  each  dimer. (we  only  show  one  strand  of  the  duplex 

0  error 

0  error  =  GpT  or  GpU 
0  error 
0  error 
1  error  =  CpC 
1  error  =  TpT  or  UpU 
1  error  =  CpT  or  CpU 

1  error  =  TpC  or  UpC 

2  errors 

2  errors  =  UpG  or  UpG 
2  errors 
2  errors 

From  now  on,  following  the  central  dimer,  these  fragments  are 
referred  to  respectively  as:  GC , AC , AT , AU , GG , AA , AG , GA , CG , CA ,  TA  and 
UA. 


concerned,  in  the  5' -3'  sense)  : 


. . CGCGCGCG. . 
.  .CACACACA. . 
.  .TATATATA. . 
. . UAUAUAUA . . 
. . GGGGGGGG .  . 
. .AAAAAAAA. . 
. .GAGAGAGA. . 
. .AGAGAGAG. . 
. . GCGCGCGC . . 
.  .ACACACAC. . 
.  .ATATATAT. . 
.  . AUAUAUAU . . 


G(syn)pC(anti) 
A(syn)pC(anti) 
A(syn)pT(anti) 
A(syn)pU(anti) 
G(syn)pG(anti) 
A(syn)pA(anti) 
A(syn)pG(anti) 
G  (syn) pA (anti) 
C(syn)pG(anti) 
C(syn)pA(anti) 
T(syn)pA(anti) 
U(syn)pA(anti) 
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As  starting  conformations  for  energy  minimisation  we  have  used 
the  idealized  Zj  crystallographic  results  of  Rich  (11) .  However, 
changes  in  starting  geometry  were  found  to  be  unimportant.  In 
particular,  modification  of  the  sequence  of  an  energy  optimised 
oligomer,  followed  by  reminimisation,  allowed  all  the 
conformational  features  of  the  new  sequence  to  be  recovered. 

RESULTS  AND  DISCUSSION 

The  energy  optimisation  of  the  oligomers  described  above 
leads  to  more  the  results  show  in  table  1.  Note  that  for  certain 
base  sequence  more  than  one  possible  low  energy  conformation  was 
found.  In  addition,  considerable  changes  in  conformation  were 
observed  wich  can  be  characterized  by  the  shift  of  the  base  pairs 
from  the  helical  axis.  In  the  table  the  various  results  are 
segregated  on  the  basis  of  this  parameter. 


TABLE  1 

Energies  (Eenv)  (kcal/mol)  of  Z-DNA  and  Z-RNA  as  a  function  of 
sequence  ,  ordered  horizontally  in  terms  of  base  pair  shift. 


DNA  Sequence 

<  3  A 

> 

3  A 

RNA  Sequence 

<  3  A 

>  3  A 

GC 

-  22.3 

GC 

7.9 

17.2 

AC 

-  13.2 

AC 

15.0 

22.2 

AT 

-  6.7 

AU 

21.5 

33.1 

GG 

-  19.1 

- 

19.9 

GG 

10.6 

11.9 

AA 

-  4.2 

- 

3.8 

AA 

23.5 

28.0 

AG 

-  11.2 

- 

11.8 

AG 

15.9 

20.5 

GA 

-  12.6 

12.8 

GA 

17.8 

19.2 

CG 

- 

18.7 

CG 

12.0 

12.4 

CA 

- 

11.2 

CA 

19.7 

20.1 

TA 

-  3.3 

3.2 

UA 

24.9 

27.6 

If  we  consider  the  lowest  energy  state  in  each  case,  we  obtain 

the  following  order  of  stability: 

For  Z-DNA: 

Dimers:  GC  >  GG  >  CG  >  AC  >  GA  >  AG  >  CA  >  AT  >  AA  >  TA 
Errors:  0120112012 


For  Z-RNA: 

Dimers:  GC  >  GG  >  CG  >  AC  >  AG  >  GA  >  CA  >  AU  >  AA  >  UA 
Errors:  0120112012 
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It  can  be  seen  from  this  order  that  the  classical 
G(syn)pC(anti)  sequence  is  effectively  found  to  be  the  most  stable 
in  the  Z  form  of  both  nucleic  acids.  The  energetic  order  seen 
primarily  correlates  with  the  number  of  GC  pairs  in  each  sequence, 
but,  within  this  division,  there  is  also  a  clear  ordering  in  terms 
of  the  number  of  errors. 

In  order  to  estimate  B-Z  or  A-Z  transition  enthalpies  we  have 
also  studied  B  or  A  oligomers  under  high  dielectric  damping 
conditions  (table  2) .  These  results  have  been  described  in  detail 
in  previous  publications  (12,13) 


TABLE  2 

Energies  (E„nv)  (kcal/mol) .of  B-DNA  and  A-RNA  as  a  function  of 
sequence  under  high  dielectric  damping  conditions 


DNA  Sequence 

Energy 

RNA  Sequence 

Energy 

GC 

-  26.8 

GC 

-  3.2 

AC 

-  18.8 

AC 

5.5 

AT 

-  11.9 

AU 

12.3 

GG 

-  26.8 

GG 

-  1.9 

AA 

-  13.1 

AA 

10.3 

GA 

-  19.5 

GA 

5.2 

From  these  results  we  can  now  estimate  the  enthalpies  of 
transition  (AH  tran)  between  B  or  A  and  Z  allomorphs.  Although  the 
present  calculations  include  neither  entropic  effects  nor  explicit 
solvent  or  counterion  interactions,  it  is  interesting  to  compare 
the  order  we  deduce  with  available  experimental  results  for  Z-DNA. 
These  results,  transition  free  energies,  have  been  directly 
obtained  experimentally  only  for  the  sequences  indicated  by  a 
star,  the  remaining  values  being  deduced  by  simple  arguments  based 
on  stacking  energies  (14). 


B-Z  transition  for  DNA: 

GC  >  AT  >  AC  >  GA  >  GG  >  AG  >  CA  =  TA  >  CG  >  AA 
AH  tran:  4.5  4.9  5.2  6.7  6.9  7.6  7.7  8.1  8.6  8.9 
Errors:  0001112221 

Experimental:  GC*>  AC*>  AT*=  GG*>  GA*>  AG  >  AA  >  CG  >  CA  >  TA 


A-Z  transition  for  RNA: 

AU  >  AC  >  GC  =  GA  >  GG  =  AG  =  UA  >  AA  >  CG 
AH  tran:  9.2  10.5  10.8  10.8  12.5  12.6  12.6  13.1  15.2 
Errors:  000111212 


CA 


i 

i 

% 


15.3 

2 
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For  Z-DNA,  we  note  a  good  correlation  between  experimental  and 
predicted  orders.  We  can  see  from  all  these  results  that  there  is 
a  strong  correlation  with  the  number  of  syn-purine/anti-pyrimidine 
errors.  It  thus  appears  that  the  relative  instability  of 
pyrimidine  nucleotides  in  the  syn  conformation,  which  has  been 
recognised  both  experimentally  and  theoretically  (15  and 
references  therein) ,  is  an  important  factor  controlling  the 
stability  of  the  Z  conformation  of  sequence. 

One  can  also  note  from  these  results  that  the  relative  difficulty 
of  converting  RNA  to  the  Z  conformation  (16)  is  reflected  by 
calculated  transition  enthalpies  which  are  roughly  twice  as  large 
as  those  we  have  obtained  for  DNA  oligomers. 

For  many  of  the  sequences  studied  we  have  been  able  to  locate 
several  stable  conformations.  The  details  of  all  these  structures 
are  given  in  tables  3  and  4.  The  most  appropriate  way  to 
distinguish  amongst  the  different  conformational  possibilities  is 
the  shift  of  the  base  pairs  from  the  helical  axis  since  very  large 
variations  of  this  parameter  are  found.  It  can  be  seen  that  the 
shift  is  related  to  the  number  of  errors  :  no  errors  implies  small 
shifts,  one  error  can  lead  to  either  small  or  large  shifts  and  two 
errors  can  generaly  lead  to  large  shifts  (see  also  table  1). 

We  now  return  to  the  details  of  the  conformations.  We  note  a 
strong  differentiation  of  the  twist  between  the  syn-p-anti  steps 
and  the  anti-p-syn  steps.  Generally,  a  small  twist  (anti-p-syn 
step)  corresponds  to  a  large  rise  and  vice  versa. 

For  small  shift  conformations,  we  can  also  note  several  common 
moderate  positive  tilts  and  small  negative  propeller  twists.  The 
Z  conformation  of  RNA  leads  to  the  possibility  of  forming  a 
variety  of  sugar-base  hydrogen  bonds  in  the  minor  groove.  This 
possibility  is  limited  to  syn-p-anti  dinucleotide  steps  where  the 
large  twist  brings  the  ribose  hydroxyl  of  the  anti-nucleotide 
close  to  the  base  of  the  anti-nucleotide  in  the  opposing  strand. 
The  existence  of  such  bonding,  which  occurs  whenever  the  latter 
base  is  a  pyrimidine  (implying  bonding  to  the  02  atom)  or  adenine 
(bonding  to  N3) ,  is  a  contributing  factor  to  the  small  shift 
values  and  the  bucking  outwards  of  both  base  pairs  forming  the 
syn-p-anti  dinucleotide. 

Turning  to  the  structures  with  large  shift,  we  can  note 
moderate  propeller  twists  but  large  negative  tilts,  and  an  overall 

variation  in  twists  which  increase  for  the  syn-p-anti  steps  and 
decrease  for  the  the  anti-p-syn  steps.  For  the  Z-RNA  intra  or 
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TABLE  3 

Helicoidal  parameters  for  the  optimized  conformations  of  Z-DNA  and 
Z-RNA  as  a  function  of  base  sequence. 


Z-DNA  Z-RNA 


Seq. 

Shift 

Tilt 

Prop. 

Buckle 

Seq. 

Shift 

Tilt 

Prop. 

Buckle 

Small 

shift 

GC 

1.2 

14.5 

-1.1 

4.9 

GC 

0.5 

10.6 

-  7.9 

-11.1 

1.2 

14.5 

-1.1 

-4.9 

0.5 

10.6 

-  7.9 

11.1 

AC 

1.5 

10.3 

0.5 

8.6 

AC 

0.6 

8.6 

-10.5 

-11.7 

1.5 

10.7 

-1.4 

-4.4 

0.7 

9.1 

-11.8 

10.9 

AT 

1.0 

4.6 

-3.6 

0.3 

AU 

0.5 

7.8 

-  8.4 

-11.6 

1.0 

4.6 

-3.6 

-0.3 

0.5 

7.8 

-  8.4 

11.6 

GG 

1.1 

14.4 

-0.3 

5.0 

GG 

0.1 

14.8 

-  6.2 

-  5.0 

1.3 

12.0 

1.8 

-4.9 

0.3 

14.3 

-  3.1 

13.5 

AA 

2.3 

6.0 

4.2 

8.8 

AA 

0.9 

4.8 

-  3.7 

-  7.0 

2.3 

6.2 

-3.4 

-10.9 

1.0 

5.2 

-  3.2 

8.8 

AG 

1.8 

8.3 

0.7 

12.2 

AG 

0.1 

12.6 

-  1.6 

3.9 

1.7 

10.4 

5.8 

-11.9 

0.4 

16.7 

-10.1 

-16.5 

GA 

3.0 

2.2 

-7.6 

12.8 

GA 

0.3 

10.5 

-  8.8 

-10.3 

3.1 

4.0 

-1.5 

5.5 

0.4 

8.8 

-  2.7 

-12.4 

CG 

2.1 

8.7 

-  3.4 

5.8 

2.1 

8.7 

-  3.4 

-  5.8 

CA 

2.1 

8.1 

-  1.8 

1.8 

2.0 

8.7 

-  4.6 

-  5.6 

TA 

1.5 

10.0 

-5.7 

2.0 

UA 

1.5 

4.3 

-  8.1 

-  8.8 

1.5 

10.0 

5.7 

-2.0 

1.5 

4.3 

-  8.1 

8.8 

Large 

shift 

GC 

5.9 

-21.0 

-  2.9 

7.8 

5.9 

-21.0 

-  2.9 

-  7.8 

AC 

6.0 

-24.0 

2.1 

11.4 

5.8 

-19.1 

-  7.4 

-12.2 

AU 

5.3 

-21.0 

-  0.4 

10.7 

5.3 

-21.0 

-  0.4 

-10.7 

GG 

4.8 

-  3.1 

-7.2 

13.3 

GG 

5.3 

-  9.8 

-12.0 

9.4 

4.9 

-10.3 

0.5 

-14.8 

5.4 

-17.1 

0.3 

-15.7 

AA 

6.1 

-21.5 

0.9 

15.0 

AA 

5.5 

-18.1 

-  1.4 

13.7 

6.0 

-23,1 

-1.5 

-17.6 

5.5 

-19.0 

-  2.0 

-12.8 

AG 

6.6 

-22.5 

4.9 

20.1 

AG 

5.5 

-20.6 

2.8 

-10.2 

6.5 

-18.6 

0.1 

-20.4 

5.3 

-18.7 

-  3.6 

11.8 

GA 

4.8 

-12.3 

-3.1 

17.0 

GA 

5.5 

-  9.9 

-12.9 

-12.0 

4.8 

-  5.2 

-8 . 4 

-11.1 

5,5 

-17.6 

-  3.2 

17.3 

CG 

5.4 

-13.3 

4.1 

13.7 

CG 

5.4 

-10.6 

4.5 

15.9 

5.4 

-13.3 

4.1 

13.7 

5.4 

-10.6 

4.5 

-15.9 

CA 

4.8 

-10.2 

3.0 

11.6 

CA 

4.8 

-  8.3 

4.2 

13.6 

4.8 

-  9.0 

0.6 

-11.8 

4.8 

-  5.8 

0.3 

-12.5 

TA 

4.5 

-10.4 

0.7 

8.8 

UA 

4.7 

-10.0 

0.1 

9.7 

4.5 

-10.4 

0.7 

8.8 

4.7 

-10.0 

0.1 

-  9.7 
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k  TABLE  4 

I'  Helicoidal  parameters  obtained  for  the  different  Z-RNA  oligomers 
(all  anti-nucleotides  are  indicated  by  a  quote). 


r 

Z- 

-DNA 

Z- 

-RNA 

k. 

x? 

Sequence  Dimer 

Rise 

Twist 

Sequence 

Dimer 

Rise 

Twist 

¥ 

V 

t 

/,  ■ 

Small 

shift 

GC' 

C'pG 

4.7 

-  8.9 

GC' 

C'pG 

3.6 

-13.2 

i. 

GpC' 

2.7 

-49.2 

GpC' 

3.3 

-50.1 

V 

AC' 

C'pA 

4.6 

-10.0 

AC' 

C'pA 

3.5 

-14.3 

\ 

ApC' 

2.8 

-46.1 

ApC' 

3.4 

-51.1 

AT' 

T'pA 

3.7 

-14.6 

AU' 

U'pA 

3.3 

-14.9 

ApT' 

3.0 

-41.6 

ApU' 

3.4 

-49.7 

\ 

GG' 

G'pG 

4.7 

-11.0 

GG' 

G'pG 

3.9 

-14.4 

f 

GpG' 

2.6 

-45.0 

GpG' 

2.9 

-45.7 

\ 

AA' 

A'pA 

4.4 

-10.5 

AA' 

A'pA 

3.4 

-13.6 

ApA' 

2.7 

-44.4 

ApA' 

3.3 

-45.5 

V 

i 

GA' 

A'pG 

4.2 

-  8.1 

GA' 

A'pG 

3.9 

-14.9 

i- 

GpA' 

2.8 

-50.5 

GpA' 

3.0 

-45.2 

AG' 

G'pA 

5.0 

-11.7 

AG' 

G'pA 

3.6 

-14.0 

ApG' 

2.4 

-39.9 

ApG' 

3.2 

-45.1 

CG' 

G'pC 

4.7 

-10.4 

CpG' 

2.5 

-46.8 

\ 

CA' 

A'pC 

4.4 

-10.3 

CpA' 

2.7 

-48.4 

2r 

TA' 

A'pT 

4.4 

-  9.9 

UA' 

A'pU 

3.4 

-14.0 

t; 

TpA' 

2.7 

-46.1 

UpA' 

3.2 

-46.3 

r 

Large 

Shift 

GC' 

C'pG 

1.7 

-  4.5 

GpC' 

3.5 

-63.7 

< 

AC' 

C'pA 

1.9 

-  5.9 

ApC' 

3.2 

-61.6 

AU' 

U'pA 

2.1 

-10.3 

ApU' 

3.4 

-54.7 

y 

GG' 

G'pG 

3.7 

-  8.0 

GG' 

G'pG 

3.0 

-  8.2 

GpG' 

2.8 

-50.2 

GpG' 

2.9 

-53.9 

* 

AA' 

A'pA 

2.0 

-  7.2 

AA' 

A'pA 

2.3 

-  7.7 

ApA' 

3.1 

-57.6 

ApA' 

3.1 

-57.1 

AG' 

G'pA 

3.4 

-  6.9 

AG' 

G'pA 

2.2 

-  9.8 

ApG' 

2.7 

-52.9 

ApG' 

3.3 

-54.8 

GA' 

A'pG 

2.5 

-  6.7 

GA' 

A'pG 

3.0 

-  5.5 

GpA' 

2.8 

-53.4 

GpA' 

2.7 

-56.8 

CG' 

G'pC 

3.3 

-  8.5 

CG' 

G'pC 

3.7 

-  7.5 

* 

CpG' 

2.9 

-48.8 

CpG' 

2.8 

-48.7 

i 

CA' 

A'pC 

3.4 

-  7.7 

CA' 

A'pC 

3.8 

-  7.1 

CpA' 

2.9 

-49.8 

CpA' 

2.8 

-50.1 

i 

* 

TA' 

A'pT 

3.1 

-  7.4 

UA' 

A'pU 

3.2 

-  6.6 

TpA' 

3.1 

-51.8 

UpA' 

3.0 

-52.9 

4., 
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inter  sugar-base  hydrogen  bonds  can  be  formed  :  between  the 
guanine  amino  group  and  a  sugar  02'  in  the  opposing  strand  (for  GG 
and  AG),  between  the  H(N2)  atom  of  guanine  and  the  02'  ribose 
hydroxyl  of  the  3'  side  (for  GC  and  AC)  or  between  the  N3  atoms  of 
guanine  or  adenine  and  the  ribose  hydroxyl  of  the  attached  sugar 
(for  GG,  CG  and  CA' . 

In  all  cases,  we  find  that  syn  nucleotides  are  associated  with 
C3'-endo  sugars,  while  anti  nucleotides  have  C2'-endo  sugars. 
Despite  the  important  changes  in  helicoidal  parameters  we  have 
described,  the  backbone  geometry  remains  remarkably  close  to  those 
of  the  Zj  conformation,  and  presents  very  little  variation. 

For  Z-DNA,  the  most  stable  calculated  conformation  of  the  GC 
sequence  is  overall  very  similar  to  the  crystallographic  result. 
In  the  case  of  Z-RNA,  the  only  crystallographic  structure, 
obtained  for  the  tetramer  BrGCBrGC  (7) ,  is  compatible  with  the 
less  stable  conformations  that  we  are  located  (large  shift  and 
intra  strand  hydrogen  bond)  calculated  with  12  base  pairs.  The 
reason  for  this  result  may  be  in  part  due  to  hidrance  introduced 
by  the  bromine  atoms.  However,  in  order  to  test  for  oligomeric 
effects,  we  have  also  performed  energy  optimisations  of  the  GC 
sequence  using  only  4  base  pairs.  For  this  fragment  the  stability 
of  the  large  shift  conformation  becomes  effectively  more  stable 
due  to  decreased  inter-strand  phosphate  repulsions  which  decrease. 
In  consequence,  it  should  be  stressed  that  structural  studies  of 
short  oligomers  should  only  be  used  with  great  caution  in  making 
deductions  concerning  longer  stretches  of  nucleic  acid, 
particularly  when  more  than  one  conformational  form  can  be 
adopted. 


CONCLUSION 

The  molecular  modelling  carried  out  brings  to  light  the  fact 
that,  within  the  Z-  family,  considerable  polymorphism  can 
exist.  This  polymorphism,  while  experimentally  confirmed  in 
solution  studies  has  not  yet  been  structurally  characterized.  In 
connection  with  our  studies,  it  is  particulary  interesting  to  note 
that  the  chemical  reactivity  of  the  bases  within  natural  Z-DNA 
sequences  has  been  foud  to  be  highly  variable  and,  for  a 
particular  base,  may  also  evolve  as  a  function  of  the  superhelical 
density  (6,17,18).  By  comparing  our  results  with  these  varitions, 
we  have  been  able  to  suggest  that  the  highly  reactive  zones  within 
Z  tracts  may  indeed  be  associated  with  large  conformational 
discontinuities  characterized  by  base  pair  shift  changes  ,  and, 
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furthermore,  that  the  evolution  of  reactivity  as  a  function  of 
negative  supercoiling  can  be  understood  if  we  take  into  account 
the  fact  that  certain  sequences  can  adopt  more  than  one 
conformation  with  very  little  change  of  energy (12). 

As  concerns  ribonucleic  acid,  our  results  suggest  that  the 
experimentally  described  Zd  and  Zr  forms  of  Z-RNA  (19,20)  may  well 
be  explained  in  terms  of  the  two  stable  conformations,  again 
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characterized  by  shift  changes  that  we  have  found  (13) . 

Comparisons  with  available  experimental  data  thus  appear  to 
justify  the  existence  of  the  polymorphism  within  the  Z  family 
indicated  by  our  modeling  study  and  also  support  a  number  of  the 
detailed  base  sequence  effects  that  we  have  described. 
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SUMMARY 

The  modeling  program  AMBER  3.0  (ref.  1)  was  used  to  study  the  conformations 
adopted  by  the  C8-substituted  guanosine  adduct  of  the  carcinogen  acetylaminoflu- 
orene  (AAF),  called  dGuo-AAF.  For  that  purpose,  we  have  added  to  the  program 
AMBER  a  subroutine  allowing  the  use  of  a  distance-dependent  dielectric  constant 
with  the  form  suggested  by  Lavery  et  al.  (ref.  2).  One  of  the  minimized  confor¬ 
mations,  compatible  with  the  Z-DNA  form,  has  been  chosen  and  inserted  into  the 
hexamer  d(CGCGCG)2.  The  results  of  the  minimizations  and  molecular  dynamics 
(MD)  simulations  display  a  pronounced  dependence  on  the  choice  of  the  dielectric 
constant  and  on  the  weight  given  to  the  1-4  electrostatic  contributions  (scale  factor 
SCEE  in  AMBER). 

METHODS 

The  energy  potential  function  used  in  our  molecular  mechanics  and  molecular 
dynamics  simulations  has  the  form 

F(r)  =  H  +  V0  +  V4  +  K<nr  +  VM  +  Vhb 

where  r  is  the  3N  dimensionnal  vector  specifying  the  Cartesian  coordinates  of  the 
N  atoms  of  the  molecule.  The  first  three  terms  correspond  to  deformations  of  the 
covalent  structure,  with  contributions  arising  from  the  deformation  of  bonds  (6), 
the  deformation  of  bond  angles  (9)  and  torsional  rotations  about  bonds  ( <p ).  The 
last  three  terms  correspond  to  the  nonbonded  interactions,  broken  into  the  van  der 
Waals  (vdH/),  the  electrostatic  (es)  and  the  hydrogen  bond  ( hb )  contributions. 

In  this  potential  function,  important  errors  arise  from  the  electrostatic  term, 
especially  in  the  case  of  charged  molecules  like  nucleic  acids.  To  reduce  such  er¬ 
rors,  explicit  solvent  molecules  could  be  included  in  the  calculations  at  the  cost  of 
much  longer  computation  time  and  with  possible  convergence  problems.  If  those 
molecules  are  not  explicitly  included,  how  to  effectively  simulate  the  solvent  and 
charge  screening  effects  ? 

One  approach  uses  a  distance-dependent  dielectric  constant  «(r),  where  r  is  the 
interatomic  distance  (ref.  3,  and  references  therein). 
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Another  approach  is  the  scaling  of  the  partial  atomic  charges;  the  screening  due 
to  counterions  in  nucleic  acids  simulations  is  commonly  represented  by  reducing  the 
charges  at  phosphate  groups  (refs.  4-5). 

But  the  difficulties  of  representing  accurately  the  electrostatic  effects  in  the 
polyanionic  nucleic  acids  are  great,  and  each  of  the  previous  molecular  dynamics 
studies  of  DNA  has  adopted  a  different  approach.  The  next  table  shows  several  of 
those  approaches. 


Approaches  used  in  some  MD  simulations 

References 

electrostatic  interactions  are  neglected;  solvent 
molecules  are  not  included. 

(ref. 4) 

a  distance-dependent  dielectric  constant,  e(r)  =  r,  is 
used;  phosphate  charges  are  scaled  down  to  -0.2. 

(ref. 5) 

simulations  with  fully  charged  phosphates,  with  counte¬ 
rions;  distance-dependent  dielectric  constants,  e(r)  =  r 
and  e(r)  =  4 r,  are  used. 

(ref.6) 

simulations  with  counterions;  solvent  molecules  are  in¬ 
cluded  explicitly;  a  dielectric  constant,  e  =  1,  is  used. 

(ref.  7) 

solvent  molecules  are  included  explicitly;  a  dielectric 
constant,  e  =  1,  is  used;  scaling  of  atomic  partial  charges 
so  that  the  system  is  electrically  neutral. 

(ref.8) 

In  the  table,  the  distance-dependent  dielectric  constants  are  e(r)  =  r  and  e(r)  = 
4r.  But  others  functions  have  been  proposed.  We  have  studied  two  of  them: 

£(r)  =  A  +  i  +  ke-™' 

where  A  =  -20.929,  B  =  99.329,  A  =  0.001787,  k  =  3.4781  (ref.  9). 

D-  1, 


e(r)  =  D  -  — y— ((Ar)2  +  2Ar  +  2)e 
where  D  =  78,  C  =  2.674,  H  =  7.5,  A  =  jf  (ref.  2) 
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The  diagrams  below  represent  the  variations  of  different  distance-dependent  di¬ 
electric  constants  e(r)  and  the  variations  of  the  associated  electrostatic  potentials 
(in  kcal.mol”1)  as  a  function  of  the  interatomic  distance  r 

\  332 q,qj 

r)  =  ^W 

with  q,  =  q}  =  0.3  units  of  proton  charge  . 
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e(r)  =  4r 


Those  figures  show  that  e(r)  =  4r,  for  r  <  20  A,  is  a  good  approximation  of  the 
sigmoidal  functions.  On  the  other  hand,  electrostatic  effects  appear  over-estimated 
using  e(r)  =  r,  especially  at  distances  larger  than  the  Debye  length  (8-10  A)  and  in 
the  case  of  helical  nucleic  acids  where  most  atoms  are  accessible  to  solvent  molecules. 

The  function  developped  by  Lavery  et  al.  (ref.  2),  called  eCQ|,  has  the  advantage 
of  being  more  sensitive  for  small  values  of  interatomic  distance  r,  which  increases  the 
accuracy  of  the  calculations.  Moreover,  first  and  second  derivatives  are  continuous, 
essential  conditions  for  energy-minimization  algorithms.  Therefore,  we  decided  to 
use  the  latter  dielectric  function  in  the  electrostatic  energy  calculations.  For  this 
purpose,  the  program  AMBER  was  modified  and  a  subroutine  was  added  allowing 
the  use  of  this  distance-dependent  dielectric  constant. 
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RESULTS 

It  has  been  proposed  that  conformational  properties  of  the  AAF-bound  region  in 
a  double-stranded  DNA  molecule  are  important  factors  in  the  processus  of  carcino¬ 
genesis  (ref.  10).  We  first  studied  the  conformational  and  dynamics  properties  of 
the  C8  bound  guanosine  adduct  of  AAF  (dGuo-AAF).  The  molecular  mechanical 
parameters  reported  by  Weiner  et  al.  (ref.  1)  were  used.  Additional  parameters  and 
charges  were  necessary  for  AAF  (to  be  published).  Coordinates  of  the  guanine  AAF 
adduct  are  from  Neidle  et  al.  (ref.  11).  The  nomenclature  used  for  dGuo-AAF  is 
shown  below. 


For  the  conformational  study,  systematic  minimizations  were  done  in  dependence 
of  two  dihedral  angles,  defined  by  N9-C8-NA2-CA2  for  a  and  91’-Cl’-N9-C8  for  x > 
which  varied  between  0°  and  360°  in  step  of  30°.  The  angle  7,  defined  by  C8-NA2- 
CA14-CA15,  was  kept  near  180°  in  agreement  with  the  results  of  Evans  and  Miller 
(ref.  12). 

Calculations  (with  ecai,  SCEE=1.0)  gave  us  a  minimization  map  with  twelve  main 
minima.  Molecular  dynamics  runs  (300  K,  50  ps),  starting  from  each  minimum, 
revealed  that  the  four  domains  centered  on  a  =  0°  and  x  =  -100°  were  connected 
by  low  conformational  barriers.  During  molecular  dynamics  in  each  of  the  twelve 
domains,  the  dGuo-AAF  sugar  ring  adopted  preferentially  the  C2’-endo  pucker  but 
the  pseudorotation  phase  angle  extended  throughout  the  southern  hemisphere  of 
the  pseudorotation  circle  (from  Ol’-endo  to  Ol’-exo)  .  The  amplitude  of  pucker, 
however,  remained  centered  on  39°. 
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With  the  program  FRODO  (ref.  13),  the  conformation  of  guanosine  G4  in  the 
hexamer  d(CGCGCG) 2  (coordinates  are  from  Wang  et  al.  (ref.  14))  was  replaced 
by  each  of  the  two  minimized  conformations  of  dGuo-AAF  compatible  with  Z-DNA 
structure. 

Stereo  views  below  show  examples  of  the  conformations  of  the  AAF-modified  hex¬ 
amer  after  minimization  (with  ec„i ,  SCEE=1.0).  For  the  first  stereo  view,  corre¬ 
sponding  to  domain  9,  values  of  angles  a  and  x  are  18°  and  -87°.  For  the  second, 
corresponding  to  domain  8,  the  values  are  -13°  and  -64°. 
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I  An  important  result  was  the  formation,  after  minimization  in  domain  9  (but  only 

with  ecn/),  of  a  hydrogen  bond  between  the  oxygen  OA14  of  AAF  and  a  hydrogen 
bound  to  the  nitrogen  N4  of  the  cytosine  C5.  Minimizations  starting  from  domain 
j  8  (whatever  the  electrostatic  parameters)  led  to  a  conformation  where  such  an  in- 

I  teraction  is  impossible.  Such  an  intramolecular  H-bond  between  bound  AAF  and 

1  Z-DNA  has  never  been  noticed.  In  order  to  study  its  stability,  molecular  dynamics 

simulations  were  performed  starting  from  minimizations  of  the  hexamer  in  domain 
j  8  with  only  residues  C3-G4-C5  shaken  during  the  dynamics  by  the  use  of  the  belly 

option.  The  simulations  were  done  with  different  distance-dependent  dielectric  con- 
S  stants  for  two  values  of  the  scale  factor  for  1-4  electrostatic  interactions  (SCEE). 

;  Here,  we  present  the  results  obtained  by  using  e(r)  =  eca(  and  e(r)  =  4r,  with  SCEE 

equal  to  1  or  2.  As  shown  in  the  diagrams  below,  which  display  the  variation  of  tor¬ 
sional  angles  a  and  x  between  domain  8  (no  H-bond)  and  domain  9  (with  H-bond) 
'  during  MD  runs,  the  stability  of  the  intramolecular  H-bond  between  OA14(AAF) 

and  H-N4(C5)  depends  strongly  on  the  choice  of  the  electrostatic  parameters. 


degrees  e(r)  =  4r,  SCEE=1.0 
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Thus,  with  e(r)  =  4r  and  SCEE=1.0,  the  AAF  remained  most  of  the  simula¬ 
tion  in  domain  8;  while  with  c(r)  =  ecaf  and  SCEE=1.0,  the  AAF  flipped  soon  to 
domain  9  and  remained  there  with  formation  of  the  intramolecular  H-bond.  With 
SCEE=2.0,  the  oscillations  between  the  two  domains  are  more  frequent  and  the  con¬ 
formation  with  the  intramolecular  H-bond  seems  favoured.  However,  independently 
of  the  parameters  chosen  during  the  MD  simulations,  the  dGuo-AAF  sugar  pucker 
adopted  the  Cj’-endo-Ol’-exo  conformation  with  an  amplitude  of  39°.  In  contrast, 
during  MD  simulations  of  the  unmodified  Z-hexamer,  large  variations  in  phase  and 
amplitude  of  pseudorotation  of  the  guanine  sugars  were  observed  (  ref.  15). 

In  conclusion,  for  charged  molecules  like  nucleic  acids,  electrostatic  parameters 
influence  greatly  the  preferred  conformations  and  their  dynamics.  Minimizations 
and  MD  simulations  should  be  run  systematically  with  different  values  for  those 
parameters. 

It  is  known  that  covalent  binding  of  AAF  to  poly  d(G-C)  stabilizes  the  Z  con¬ 
formation  (ref.  16).  The  intramolecular  H-bond  between  AAF  covalently  linked 
to  guanosine  C8  and  the  amino  group  of  the  3’-proximal  cytosine  revealed  by  the 
present  calculations  contributes  probably  to  the  stability  of  AAF-modified  Z-DNA. 
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ABSTRACT 

Theoretical  computations  are  performed  of  the 
comparative  binding  affinities  of  five  polymethylene 
carboxamide  derivatives  of  9-aminoacridine  to  a  series  of 
double-stranded  hexanucleotides.  The  purpose  of  this 
investigation  is  to  ascertain  whether  minor  groove  recognition 
of  a  guanine  base  adjacent  to  the  intercalation  site  can  occur, 
and  be  preferentially  stabilized,  for  a  given  length  of  the 
polymethylene  side  chain,  encompassing  from  n=2  up  to  n=6 
methylene  groups.  For  that  purpose,  several  representative 
sequences  were  investigated,  in  which  intercalation  of  the  9- 
aminoacridine  chromophore  occurred  at  a  central  d(CpG)  or 
d(TpA)  step.  Investigated  were  the  self-complementary  sequences 
d  (CGCGCG)  o ,  d  (GCCGGCU,  d(TATATA),  and  d(ATTAAT)2,  as  well  as 
the  "mixed"  sequences  d(ACTAAT)  . d (ATTAGT)  and  d(TGTATA). 
d(TATACA).  For  n=3  up  to  n=6,  such  a  recognition  was  enabled 
only  when  the  guanine  base  was  located  downstream  of  the 
intercalation  site,  i.e.  with  steps  d(CGG)  and  d ( TAG )  .  It 
occurred  by  means  of  a  bidentate  interaction  involving,  on  the 
one  hand,  H(N2)  and  N3  of  the  base,  and,  on  the  other  hand,  the 
carbonyl  oxygen  and  the  cis  amino  hydrogen  of  the  terminal 
formamide  moiety  of  the  ligand.  Because  of  the  flexibility  of 
the  side  chain,  however,  alternative  binding  modes  were  also 
found  to  occur  competitively,  involving  backbone-only 
interactions  of  the  side  chain. 

On  the  basis  of  the  present  computations  using  the 
SIBFA  procedure  [1-2],  upon  binding  to  the  sequence  d(GCCGGC)2, 
an  optimal  value  of  n=5  could  be  derived,  with  tne 
corresponding  acridine  derivative  eliciting  both  a  significant 
prevalence  of  the  bidentate  over  backbone  only  binding  mode, 
and  the  most  favourable  energy  balance  within  the  investigated 
series  (see  Table  I)  .  This  privileged  value  of  n=5  is  fully 
consistent  with  the  experimental  results  of  Markovits  et  al. 
and  Gaugain  et  al.  [3-5].  The  very  flexibility  of  the  side 
chain,  however,  hampered  any  preferential  recognition  of  a 
triplet  sequence  with  a  downstream  guanine,  such  as  d(CGG)  or 
d(TAG),  to  be  elicited  over  sequences  such  as  d(TAA),  d (TAT)  or 
d(TAC)  . 

A  more  complete  account  has  been  submitted  for 
publication  by  the  authors  to  Nucleic  Acids  Research. 
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TABLE  I 


Values  of  the  binding  energetics  in  the  energy-minimized 
complexes  of  N2-N6  with  the  self-complementary  sequence 
d (GCCGGC) j • 

Energies  m  kcal/mole. 


d (GCCGGC) 2 

Bidentate 

mode 

N3 

N4 

N5 

N6 

AE 

-252.4 

-254.4 

-260.0 

-256.6 

51ig 

19.6 

12.3 

14.8 

18.9 

SE 

-232.8 

-242 . 1 

-245.2 

-237.7 

S 

12.4 

3.1 

0.0 

7.5 

d (GCCGGC) 2  -  Nonbidentate  binding  mode 

N2 

N3 

N4 

N5 

N6 

AE 

-243.6 

-240.0 

-251.3 

-258.0 

-258.5 

Slig 

9.9 

17.7 

16.0 

16.3 

17.4 

SE 

-233.7 

-222.3 

-235.3 

-241.7 

-241.1 

6 

11.5 

22.9 

9.0 

3.5 

4.1 
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SUMMARY 

Footprinting  experiments  showed  that  the  antitumour  drug  N-methyl  9-hydroxy  ellipticine 
(NMHE)  preferrentially  binds  to  CG  sites  in  double  stranded  DNA,  and  led  to  the  determination 
of  a  consensus  sequence  ACGT  (B.  RENE  and  co-workers). 

To  elucidate  this  selectivity,  we  have  carried  out  the  H  NMR  study  and  the  molecular 
modelling  of  the  binding  of  NMHE  to  a  DNA  octamer  containing  the  consensus  sequence  - 
TGACGTCA  and  its  inverse  sequence  ACTGCAGT.  The  two  fragments,  which  are  both  self¬ 
complementary,  have  first  been  studied  without  the  drug. 

From  the  coupling  constants  Jl’2\  JP2",  J2’3\  J2"3\  measu.ed  from  COSY  experiments,  we 
have  considered  each  sugar  ring  as  the  result  of  an  equilibrium  between  the  C2'- end o  and  the  C3’- 
endo  conformations,  which  can  be  represented  by  the  percentage  of  C2 '-endo  form  (1).  We  also 
carried  out  constrained  minimizations,  using  the  experimental  couplings  and  interproton  distances 
(H6/H8  -  HI’,  H2’,  H2",  inter-  and  intra-nucleotide),  determined  from  NOESY  experiments. 

The  work  concerning  the  binding  of  the  drug  to  the  fragments  is  currently  being  done. 


MATERIALS  AND  METHODS 

Oligonucleotides  TGACGTCA  (I)  and  ACTGCAGT  (II)  have  been  synthesized  at  the 
I.R.S.C.  (Villejuif)  using  the  solid  phase  procedure  on  an  Applied  Biosystems  380  B  automatic 
apparatus.  Final  purification  has  been  ensured  by  HPLC  and  purity  checked  by  NMR.  Samples 
have  been  prepared  in  deuterated  phosphate  buffers  (pH  7,  I  =  0.1  and  0.2  mM  EDTA. 
Oligonucleotides  were  lyophilized  twice  in  JHjO. 

Nuclear  Magnetic  Resonance  :  data  collection  and  processing. 

Two  dimensional  proton  NMR  spectra  were  collected  on  Bruker  spectrometers  at  various 
fields  :  500  MHz  (AM500  ;  Laboratoire  de  Chimie  Organique,  Jussieu),  and  300  MHz  (MSL300  ; 
I.R.S.C.  Villejuif). 

Coupling  constants  :  1H-*H  vicinal  couplings  constants  J1’2'  and  Jl’2"  were  measured  on 
well-resolved  HI’  resonances  in  ID  spectra  ;  J2’3’  and  J2"3’  were  extracted  from  COSY  phases  with 
double  quantum  filter  recorded  with  high  resolution  in  F2  dimension  (12  Hz/pt),  as  mentioned 
elsewhere  (1). 
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Distance  constraints.  Two  dimensional  phase  sensitive  NOESY  spectra  in  2H20  were 
recorded  at  300  MHz  for  mixing  times  of  100,  150  and  200  ms.  The  data  in  the  tj  and  t2 
dimensions  were  apodized  by  using  a  90°  phase-shifted  sine  bell  before  Fourier  transformation. 
Volumes  for  each  resolved  cross-peak  were  measured  from  2D  NOESY  spectra.  A  cytidine  H5-H6 
distance  of  2.45  A  was  used  as  a  yardstick  for  all  NOEs  between  nonexchangeable  protons. 

Molecular  mechanics  calculations  and  graphics  manipulations  were  carried  out  on  a  Silicon 
Graphics  IRIS  4D  70/GT  workstation,  using  the  program  AMBER  (2)  (Koilman  el  al.)  associated 
to  an  energy  minimizer,  ORAL,  developed  in  the  laboratory,  and  the  program  NACAD  (Nucleic 
Acid  Computer-Aided  Design),  developed  in  the  laboratory  as  well. 

The  parameters  were  those  described  by  Weiner  el  al.  (3)  (1986)  and  by  Singh  el  al.  (4) 
(1986).  All  hydrogen  atoms  were  treated  explicitly.  To  simulate  the  screening  effect  of  the  solvent 
a  gas  phase  potential  was  employed,  where  the  dielectric  constant  D„  is  proportional  to  the  distance 
d; ;  separating  a  pair  of  atoms  :  D..  =  C.d..  (Gelin  &  Karplus,  1981  ;  Weiner  et  al.,  1984)  (5,6).  C 
was  taken  as  4  A  .  All  pairs  were  included  in  the  calculation  of  nonbonding  interactions.  The 
minimizations  were  carried  out  with  the  1-4  interatomic  interactions  divided  by  two.  Refinements 
were  terminated  when  the  norm  of  the  energy  gradient  was  less  than  0.01  kcal/A. 


RESULTS  AND  DISCUSSION 

From  the  observation  of  the  experimental  coupling  constants,  given  in  table  1.,  one  can  see 
that  the  two  fragments  present  significant  differences  in  the  geometries  (puckering)  of  the  sugars. 


(I) 

Jl’2’a 

JI’2"a 

J2’b 

J2"C 

T1 

8.6 

5.2 

29.0 

20.2 

G2 

10.1 

5.5 

26.6 

21.1 

A3 

9.3 

5.3 

28.0 

21.1 

C4 

8.5 

5.8 

30.5 

23.7 

G5 

8.2 

6.2 

28.0 

22.7 

T6 

7.3 

7.3 

30.5 

22.7 

C7 

7.0 

7.0 

29.0 

24.0 

A8 

6.8 

6.8 

28.0 

24.2 

(ID 

Jl’2’a 

Jl’2"a 

J2’b 

J2"c 

Al 

7.8 

6.2 

28.0 

20.8 

C2 

7.3 

7.3 

28.0 

22.0 

T3 

8.5 

6.1 

29.0 

23.0 

G4 

9.6 

5.8 

26.5 

23.0 

C5 

7.8 

6.1 

28.0 

22.0 

A  6 

8.7 

6.2 

30.0 

22.0 

G7 

7.3 

7.3 

28,0 

22.5 

T8 

7.0 

7.0 

d 

d 

Table  I.  *H-1H  coupling  constants  for  octamers  (1)  and  (II). 

a  :  measured  on  600  MHz  ID  spectra  at  30  °C  except  for  T1  and  C7  measured  at  24  °C  (accuracy 
+  0.365  Hz)  and  C2  and  G7  which  were  measured  on  the  Hl'/H2’  COSY  cross-peak  (accuracy  +  1.2 
Hz).  b  :  measured  on  the  Hl’/H2’  COSY  cross- peak  (accuracy  +  1.2  Hz).  c  :  measured  on  the 
H 1 VH2"  COSY  cross-peak  (accuracy  +  1.2  Hz).  .  H2’  and  H2"  strongly  overlap. 
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For  both  molecules  a  set  of  64  =  26  starting  structures  was  generated,  considering  for  each 
sugar  ring  (except  those  at  the  extremities)  the  CT-endo  and  C3 '-endo  conformations.  Those 
models  were  refined,  without  constraining  the  geometry  of  the  deoxyribose.  We  selected  the  best 
ten  ones  in  terms  of  energy  (with  less  than  5  kcal  difference  between  one  another).  We  calculated 
the  coupling  constants  of  our  models  and  confronted  them  to  the  experimental  constants.  We 
considered  each  nucleotide  independently.  For  most  of  them,  the  calculated  couplings  were  in  good 
agreement  with  the  experimental  ones  (pure  C2'-endo).  For  the  other  ones  -  A3,  C^,  G5  -,  we  had 
to  consider  the  equilibrium  C2 ’-endo  -  C3 '-endo,  i.e.  taking  the  Cl' -endo  form  of  one  sugar,  we 
tried  to  find  another  structure  where  the  same  sugar  exists  in  the  C3 '-endo  conformation,  and  a 
fraction  p  such  as,  for  the  4  couplings : 

p,JC2’endo  +  ^  _P^^C3’endo  =  Jexp  ^ 

We  ruled  out  the  structures  for  which  it  was  not  possible  to  find  such  a  fit  for  all  nucleotides. 

It  came  out  that  no  model  was  left  at  the  end.  Therefore  constrained  minimizations  were 
run  in  order  to  achieve  the  determination  of  the  structures  with  another  method.  Experimental 


Model 

E  (kcal/mol) 

*<WV*/d«ie 

£(Jca.c"U>; 

Arnott  (minimized) 

-196 

3.7 

18.7 

Arnott  with  constraints 
on  sugar  torsion  angles 

-192 

3.3 

15.6 

Arnott  with  constraints 
on  interproton  distances 

-200 

3.2 

18.6 

(2)  +  (3) 

-198 

3.4 

21.5 

Modified  twist+pucker 
from  Rinkel  &  Altona 
without  constraints 

-206 

3.7 

13.9 

(5)  with  constraints 
on  the  delta  angle 

-189 

3.3 

16.0 

Model 

E  (kcal/mol) 

-  V2/J«>c 

Arnott  (minimized) 

-188 

16.7 

Arnott  with  constraints 
on  sugar  torsion  angles 

-188 

13.5 

Modified  twist+pucker 
from  Rinkel  &  Altona 
without  constraints 

-192 

12.6 

(3)  +  constraints 
on  the  delta  angle 

-192 

13.1 

Table  2.  a  :  Energies  and  sums  of  errors  on  the  54  distances  and  24  couplings  (J1’2’,  J1’2M,  J2’3’ 
and  J2"3’)  for  several  refined  models  (TGACGTCA). 

b  :  Energies  and  sums  of  errors  on  the  24  couplings  J!’2\  J1’2”,  J2’3’  and  J2"3’  - 
(ACTGCAGT). 
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a  Step 

Theoretical  value 
(Sarai  el  al.) 

Helical  Twist 
for  the  model 

T-G 

29 

37.5 

G-A 

38 

34.5 

A-C 

40 

39.7 

C-G 

23 

35.7 

G-T 

40 

39.7 

T-C 

38 

34.5 

C-A 

29 

37.5 

b  Step 

Theoretical  value 
(Sarai  et-al.) 

Helical  Twist 
for  the  model 

A-C 

40 

40.9 

C-T 

35 

34.3 

T-G 

29 

33.8 

G-C 

38 

40.3 

C-A 

29 

33.8 

A-G 

35 

34.3 

G-T 

40 

40.9 

Table  3.  Helical  Twist  for  the  refined  models  obtained  from  Arnott  structures  with  modified  twists 
:  a-  TGACGTCA,  b -  ACTGCAGT. 

couplings  (24  constants  for  each  fragment)  were  converted  into  torsion  angles,  using  the 
generalized  Karplus  equation  (7),  and  introduced  as  constraints  in  the  calculations,  as  well  as  54 
interproton  distances  (for  TGACGTCA  only).  Different  combinations  were  used,  as  shown  in 
tables  2.  :  no  constraints,  constraints  on  the  torsion  angles  and/or  on  the  distances,  with  different 
starting  structures. 

The  most  satisfactory  model  came  from  the  calculations  using  constraints,  and,  interestingly, 
as  shown  in  table  2.,  modified  helical  twist.  Indeed  we  applied  to  an  Arnott  structure  (8)  where  the 
puckers  had  already  been  set  equal  to  the  values  found  thanks  to  the  method  exposed  by  Rinkel  & 
Altona,  the  theoretical  helical  twists  determined  by  Sarai  el  al.  (9).  Then  we  refined  those  starting 
structures  with  and  without  constraining  the  delta  angle  which  determines  the  sugar  puckering. 
Stereoscopic  views  of  the  two  fragments  (refined  structures  withe  modified  twist  and  without 
constraining  delta)  are  given  in  fig.  1.  and  the  helical  twists  are  shown  in  table  3.  It  is  also 
interesting  to  note  that  in  all  calculations  the  symmetry  of  the  two  strands  of  the  fragments  is 
conserved.  More  over  one  can  see  in  table  3.  that  in  both  fragments  the  twist  is  "periodic",  with  a 
period  of  2  in  the  case  of  TGACGTCA  and  a  period  of  3  for  ACTGCAGT.  Nevertheless  the 
values  of  the  twist  calculated  for  the  refined  models  differ  significantly  from  those  described  by 
Sarai  el  al.  (9). 

The  results  proved  to  be  energetically  along  the  most  favourable  and  in  good  agreement 
with  the  experimental  couplings  and  interproton  distances. 
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CONCLUSION 

In  agreement  with  NMR  results,  molecular  mechanincs  calculations  have  shown  that  the  two 
octameric  oligonucleotides  present  significantly  different  structures  : 

-  in  terms  of  sugar  puckering  :  whereas  TGACGTCA  presents  a  certain  variety  of  puckers 
in  the  range  Ql'-endo,  ACTGCAGT  appears  more  homogenous  (P  =  150°  for  all  nucleotides) ; 

-  in  terms  of  helical  parameters  :  ACTGCAGT  presents  larger  variations  in  the  helical  twist 
than  TGACGTCA. 

We  are  now  investigating  other  parameters  (propeller  twist...)  in  order  to  get  further  details 
on  the  fine  structure  of  the  two  octamers,  which  may  help  understand  their  recognition  by  the 
drug  NMHE. 
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NUCLEOTIDE  MISPAIRS  STABILIZED  BY  WATER  BRIDGES. 

MODELING  OF  STRUCTURE  AND  ROLE  IN  TEMPLATE  BIOSYNTHESIS. 
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SUMMARY 

To  explain  the  experimental  data  on  the  structure  of  double  helical 
oligonucleotides  with  mispairs,  to  study  the  molecular  mechanisms  of  point 
errors  in  the  template  directed  nucleic  acid  biosynthesis,  to  create  the 
conformation  theory  of  wobble  at  the  codon-anticodon  recognition,  we  have 
examined  the  possibility  of  formation  of  mispairs  via  water  bridges.  Such  a 
water  molecule  forms  H-bonds  with  both  bases  of  the  same  pair.  For  every 
combination  of  nucleotides  several  energy  minima  corresponding  to  two  bases 
being  in  the  same  plane  and  linked  with  the  same  water  molecule  were  found. 
Other  water  molecules  can  additionally  stabilize  some  of  these  pairs. 
Theoretical  conformational  analysis  of  DNA  fragments  containing  some  of 
these  pairs  has  permitted  to  propose  this  mechanism  to  play  a  role  in 
arising  of  errors  under  nucleic  acids  biosynthesis.  Calculations  of  the 
system  "codon  +  anticodon  loop  +  water  molecule"  reveal  energy  minima  for 
every  permissible  wobble  mispair  in  the  third  position  of  codon  -  anticodon 
complex.  In  these  calculations  codon  was  in  A-RNA  conformation,  and  the 
configuration  of  anticodon  looo  was  rather  similar  to  that  of  yeast 
Phe-tRNA. 

INTRODUCTION 

Besides  Watson-Crick  A:T(A:U)  and  G:C  pairs  nucleic  acid  bases  can  form 
many  other  "incorrect"  planar  pairs.  Due  to  formation  of  such  mispairs  many 
point  errors  in  replication,  repair,  transcription,  and  translation  are 
observed.  Moreover  such  pairs  appear  in  the  third  position  of  codon- 
-anticodon  complex  and  in  the  tertiary  structure  of  RNA.  Thus  the 
investigation  of  all  the  possibilities  of  formation  of  planar  base  pair  is 
very  important.  There  exist  minima  of  energy  of  intermolecular  interactions 
for  every  couple  of  nucleic  acid  bases  being  in  usual  tautomeric  forms  that 
allow  to  fit  such  a  pair  within  the  double  helix  (ref.  1-3).  Appearance  of 
errors  both  spontaneous  and  base  analog  induced  during  the  nucleic  acid 
biosynthesis  in  many  cases  can  be  explained  by  the  formation  of  these  pairs. 
However  for  some  base  combinations  the  pairs  which  have  been  constructed  by 
means  of  water  bridges  are  apparently  more  likely  than  pairs  without  such 
bridges.  The  mutual  arrangement  of  the  bases  in  some  of  such  pairs 
(particularly  in  pyrimidine-pyrimidine  pairs)  is  closer  to  that  in 
Watson-Crick  pairs  than  in  pairs  without  water  molecules. 
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This  paper  is  devoted  to  the  search  of  energy  minima  of  the  systems 
consisting  of  two  bases  and  one  or  two  water  molecules  and  to  calculations 
of  low  energy  conformations  of  double  helical  DNA  and  RNA  fragments 
containing  some  pairs  with  water  bridges.  The  results  of  calculations  are 
applied  to  account  for  experimental  data  concerning  structure  of 
oligonucleotide  duplexes  with  mispairs  in  solution,  to  elucidate  the 
mechanisms  of  point  error  arising  during  nucleic  acid  biosynthesis  and  to 
create  the  conformational  theory  of  wobble  upon  the  codon-anticodon 
recognition.  Some  of  the  results  presented  here  were  published  in  more 
detail  earlier  (ref.  4-7). 

METHODS 

Potential  functions. 

The  calculations  of  the  energy  of  intra-  and  intermolecular  interactions 
we re  performed  by  the  method  of  atom-atom  potential  functions  using  the 
procedures  and  parameters  described  and  used  earlier  (ref.  2,  5-8).  The 
total  energy  was  calculated  as  a  sum  of  the  energies  of  all  atom-atom 
interactions.  The  dependencies  of  the  energy  of  interactions  on  the 
distances  between  the  atoms  were  approximated  by  (1-10-12) -potentials  for 
interactions  of  hydrogen  atoms  bound  to  S  or  0  and  proton-acceptor  atoms  (N 
or  0)  or  by  (1-6-12) -potentials  for  all  other  interactions.  The  phosphate 
groups  were  neutralized,  i.e.  the  charges  of  each  of  two  phosphate  oxygens 
were  diminished  by  a  half  of  electron  charge.  Torsional  potentials  for 
rotation  around  single  bonds  and  the  energy  of  sugar  ring  bond  angle 
distortion  are  also  taken  into  account  in  calculation  of  intramolecular 
nonbonded  interactions. 

The  calculated  systems  and  the  variable  parameters. 

(i)  The  base  pairs.  The  systems  contained  two  bases  and  one  (or  two)  water 
molecule.  Energy  of  the  system  was  calculated  as  a  function  of  11  (or  17) 
variables  where  5  variables  determined  the  position  of  one  base  relatively 
to  the  other  and  6  (or  12)  -  the  position  of  one  (or  two)  water  molecules. 

As  a  rule  both  bases  and  water  oxygens  were  in  the  same  plane.  The 
increasing  of  the  number  of  variables  (including  propeller  twist  and  buckle 
of  the  bases,  permission  of  displacement  of  water  oxygens  out  from  the 
plane)  did  not  result  in  considerable  displacement  of  position  of  the  energy 
minimum.  The  initial  mutual  positions  of  the  bases  and  water  molecules  were 
determined  by  means  of  space  filling  models  and  after  that  the  energy 
minimization  was  performed.  When  energy  minimum  had  been  achieved,  the 
distance  R  between  Cl'  pentose  atoms  of  nucleotide  pair  of  imaginary  duplex 


t 
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containing  this  pair,  angles  (A.1,  and  A2)  between  the  bonds  C.V-N9  (of 
purines)  or  Cl' -SI  (of  pyrimidines)  and  the  line  passed  through  both  Cl' 
atoms  were  calculated. 

(ii)Deoxynucieotide  duplexes.  The  system  consisted  of  two  trinucleotides 
and  one  water  molecule.  At  the  first  step  it  was  determined  the  low  energy 
B-conformation  of  ( dG ) 3 : ( dC ) ^  duplex  and  then  the  midpair  was  substituted 
by  the  water  bridged  pair  (CtU  or  UiU).  For  calculating  the  atom 
coordinates  the  following  variables  were  taken  as  independents 

1)  Six  variables  determining  the  position  of  a  trinucleotide  as 
a  solid  body, 

2)  Six  variables  determining  the  position  of  water  molecule, 

3)  10  variables  per  nucleotide  determining  its  conformation,  4  of  which 
determined  the  conformation  of  the  ribose  ring.  All  the  bond  lengths  and  all 
the  bond  angles  (except  the  angles  within  sugar  ring)  were  assumed  to  be 
constant. 


(iii)  Codon-anticodon  complexes.  Codon  was  represented  by  the  tri¬ 
ribonucleotide  5 ' -CCX-3 '  and  tRNA-  by  the  tetraribonucleotide  5'-UYGG-3', 
corresponding  to  the  33-36  fragment  of  the  anticodon  loop.  At  first  the  low 
energy  conformation  of  the  complex  CCC  UGGG  was  determined  ,  where  CCC  had 

the  A-form  conformation,  and  UGGG  -  the  conformation,  similar  to  that  of  the 

Dhe 

corresponding  fragment  of  yeast-tRVA"  .  Then  the  third  pair  of  the  codon  - 
anticodon  complex  was  substituted  by  a  pair  with  water  bridge.  During  the 
procedure  of  energy  minimization  only  the  conformation  of  anticodon  loop  was 
varied,  but  the  codon  always  was  in  the  A-RMA  conformation  .  The  same 
independent  variables  were  chosen  as  for  trideoxynucleotides  and  one 
additional  variable  per  nucleotide,  determining  orientation  of  02'H-group. 


RESULTS  AMD  DISCUSSION 
Base  pairs  with  water  bridges. 

For  the  systems  containing  two  bases  and  one  or  two  water  molecules  at 
least  one  minimum  with  water  bridges  for  every  couple  of  bases  have  been 
revealed.  The  mutual  arrangement  of  the  constituents  in  some  of  these  minima 
are  represented  in  the  Figure  1.  The  corresponding  structural  and  energetic 
characteristics  are  listed  in  the  Table  1.  Here  we  describe  the  minima, 
where  mutual  arrangement  of  glycosyl  bonds  of  the  bases  is  rather  close  to 
that  in  Watson-Crick  pairs.  In  other  publications  (ref.  5-7)  we  represent 
and  discuss  the  results  of  calculations  for  many  other  such  minima.  The 
mutual  arrangement  of  the  bases  in  the  minimum  10  (AiU)  with  water  bridge 
consisted  of  two  water  molecules  is  quite  the  same  as  in  Watson  -  Crick 
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TABLE  1 

Energy  and  structure  characteristics  of  water  bridged  base  pairs3. 


No  bases 

-Eb 

-El 

-E2 

R 

A! 

A2 

Bi 

B2 

HI 

H2 

i 

U:C 

5.0 

5.1 

11.7 

11.5 

32 

45 

04 

N4H 

N3H 

N3,02 

5.6 

4.3 

0.8 

02 

2 

UiU 

4.9 

5.3 

5.3 

11.0 

56 

20 

N3H 

04 

04 

04 

0.0 

4.9 

6.3 

02 

N3H 

3 

C:C 

3.0 

7.4 

8.4 

10.2 

65 

30 

N3 

N4H 

02 

N3 

0.2 

3.4 

5.7 

N4H 

N4H 

4 

AiC 

4.1 

6.2 

7.6 

11.9 

23 

42 

N6H 

N3 

N! 

02 

0.2 

10.6 

3.5 

N7,N6H 

N4H 

5 

GiU 

7.1 

6.9 

4.2 

12.1 

85 

34 

N2H 

04 

N1H 

04 

0.0 

6.3 

4.2 

N2H 

N3H 

6 

G:A* 

5.8 

8.4 

3.8 

10.2 

82 

57 

N2H 

N7 

N1H 

N6H 

7 

U:A* 

4.7 

5.1 

6.4 

12.2 

29 

25 

04 

N6H 

N3H 

N7 

8 

G:U 

10.6 

6.2 

5.2 

10.6 

42 

69 

N1H,06 

02,N3H 

06 

04 

C 

0.0 

7.4 

3.4 

N2H 

02 

9 

m  GiU 

4.4 

9.5 

5.2 

10.1 

63 

88 

N2H 

02 

.Ml,  06 

N3H 

5.6 

0.8 

5.0 

04 

10 

AtU 

11.3 

5.0 

-0.8 

10.7 

S3 

55 

N6H,N! 

04,N3H 

N6H 

6.1 

-0.6 

6.1 

04 

aThe  second  line  for  every  base  pair  concerns  the  second  water  molecule  of 
the  complex  "2  bases  +  2  water  molecules".  The  values  of  base-base  (Eb, 
first  line),  water-water  (Eb,  second  line),  first  base-water  (El)  and  second 
base-water  (E2)  interaction  energies  are  given  in  .kcai/moi.  The  distance  R 
is  given  in  A.  The  angles  A!  and  A2  are  given  in  degrees.  In  four  last 
columns  the  atom  groups  involved  in  H-bonding  between  the  bases  (SI  and  32) 
and  between  the  bases  and  water  molecules  (HI  and  H2)  are  listed. 


Fig.  1.  Water  bridged  base  pairs.  Full  circles  are  nitrogen  atoms,  open 
ones  -  oxygens. 
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pairs.  We  represent  the  characteristics  of  this  minimum  as  the  standard  for 
the  sake  of  comparison  with  those  of  other  minima.  In  many  minima  the  bases 
are  linked  via  one  H-bond  and  one  or  two  monomolecular  water  bridges.  In 
some  minima  the  second  water  molecule  additionally  stabilize  the 
configuration  by  linking  with  the  first  water  and  with  one  of  the  bases (e.g. 
minimum  1 ) .  We  also  have  revealed  minima  with  two  H-bonds  between  the  bases 
where  water  bridges  play  roles  of  additional  stabilizers  of  structure.  The 
most  bright  example  of  such  configuration  is  G:U  wobble  pair  (minimum  8). 

The  same  mutual  arrangement  of  the  bases  and  water  molecules  has  been  found 
in  crystals  of  double  helical  oligonucleotides  d(GGGGTCCC)  and  d(GGGGCTCC) 
(ref.  9).  The  formation  of  the  water  bridges  is  suggested  to  be  one  of  the 
reasons  of  the  fact  that  the  G:T  (or  G:U)  pair  is  the  most  stable  among  all 
"incorrect"  pairs. 


Fig.  2.  Stereo  views  of  low  energy  conformations  of  the  complexes  with  water 
bridged  mispairss  d(GCG)'d(CUC)  (top)  and  d(GUG).  d(CUC)  (bottom).  The  water 
molecules  forms  H-bonds  with  the  bases  of  U  C  (U  U)  mispair  and  with  guanine 
N3  (cytosine  Nl)  of  neighbour  pair. 


In  some  minimaie.g.  minima  6,7  in  the  Table  1)  one  of  the  bases  has 
syn-orientation  with  respect  to  the  glycosyl  bond. 

The  mutual  arrangement  of  the  glycosyl  bonds  in  some  pairs  with  water 
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bridges  is  more  close  to  that  in  Watson-Crick  pairs  than  in  the  configurati¬ 
on  corresponding  to  the  minimum  of  energy  of  base-base  interaction.  Particu¬ 
larly  it  concerns  to  pyrimidine-pyrimidine  pairs.  We  suggest  that  it  is  the 
main  reason  of  a  high  probability  of  pyrimidine  -  pyrimidine  pair  formation 
as  an  error  of  nucleic  acid  biosynthesis  revealed  in  several  in  vitro 
systems  (ref.  10).  The  NMK  data  for  some  oligonucleotide  duplexes  containing 
the  "incorrect"  pairs  (ref.  11-13  )  can  be  explained  by  the  assumption  of 
the  formation  of  base  pairs  with  water  bridges.  These  pairs  correspond  to 
minima  1,4  and  9  in  Table  1. 


Fig.  3.  The  comparison  of  dihedral  angle  values  of  the  three  trinucleotide 
complexes. 


Conformations  of  deoxynucleotide  duplexes. 

For  the  estimating  the  range  of  distortions  of  sugar-phosphate  backbone 
conformation  caused  by  the  substitution  of  Watson  -  Crick  base  pair  by 
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mispairs  with  water  bridges  we  have  done  the  conformational  calculations  of 
trinucleotide  duplexes,  where  the  flanking  positions  were  occupied  by  the 
G:C  pairs  and  the  midposition  -  by  the  water  bridged  mispair.  Here  the 
results  of  calculations  of  d(CUC)* d(GCG)  and  d(CUC) *d(GUG)  complexes  are 
presented  (Figs.  2  and  3).  The  calculations  reveal  that,  in  these  cases,  the 
deviations  of  backbone  dihedral  angles  from  those  of  d(CCC)> d(GGG)  complex 
are  considerably  less  than  in  the  cases  of  mispairs  with  two  H-bonds  between 
the  bases  (ref.  2,3). 

Conformations  of  codon-anticodon  complexes 

Analysis  of  available  experimental  data  suggests  a  functional  asymmetry 
between  the  third  nucleotide  of  codon  and  the  first  nucleotide  of 
anticodon  (ref.  4).  For  example,  C:U  pair  should  be  considered  as  permitted 
and  can  be  formed  in  mytochondria,  but  U:C  pair  is  always  impossible.  This 
fact  contradicts  to  Crick's  wobble  hypothesis,  according  to  which  both  pairs 
should  be  simultaneously  either  possible  or  impossible. 


Fig.  4.  Stereoview  of  low  energy  conformation  of  the  codon-anticodon  complex 
with  water  bridged  C  U  pair  in  the  third  position.  The  water  molecule  forms 
H-bonds  with  the  bases  of  C  U  pair  and  also  with  guanine  ,V3  of  neighbour 
pair. 

To  resolve  this  contradiction  it  was  proposed  (ref.  4)  that  (i)  in  the 
codon-anticodon  complex  the  codon  is  fixed  in  the  A-RNA  conformation,  and 
the  arrangement  of  sugar-phosphate  backbone,  that  is  necessary  for  any 
wobble  pair  formation  occurs  only  in  the  anticodon,  and  that  (ii)  two  bases 
in  the  third  position  of  the  codon-anticodon  complex  can  mate  not  only 
according  to  the  Crick's  scheme,  but  also  via  water  bridges.  In  such  a 
theory  the  first  of  the  two  pairs,  CsU,  becomes  permitted  and  the  second, 
U:C,  appears  stericaily  impossible.  The  calculations  reveal,  that 
incorporation  of  the  water  bridged  pair  in  the  third  position  of  the 
codon-anticodon  complex  (Fig. 4)  alters  the  energy  of  anticodon  loop  for  less 
than  1  kcal/moi.  That  is  why  such  distortions  of  the  loop  should  be  regarded 
as  quiet  possible.  Analysis  of  possibility  of  incorporation  of  the  pairs 
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with  water  bridges  in  the  third  position  of  codon-anticodon  complex  has 
allowed  not  only  to  explain  what  the  pair  are  possible  and  what  are  not,  but 
also  to  predict  the  relative  efficiencies  of  their  formation. 
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SUMMARY 

Ditercalinium  is  a  synthetic  antitumoral  and  cytotoxic  DNA  bi-functional 
intercalator  containing  a  rigid  chain  that  links  its  two  pyridocarbazole  aromatic 
moieties.  Previous  theoretical  and  experimental  studies  have  shown  that  the  complex 
in  which  the  linking  chain  of  the  ligand  is  in  the  major  groove  is  favored  over  the  minor 
groove  complex  for  the  case  of  CG  sequences.  It  was  interesting  thus  to  find  out 
whether  this  was  the  case  for  other  sequences.  In  this  work  the  structures  and 
binding  energetics  of  the  complexes  formed  between  ditercalinium  and  a  series  of 
model  tetradeoxynucleotide  duplexes  have  been  investigated  by  means  of  a 
molecular  mechanics  procedure  especially  developed  for  nucleic  acid  structures. 

A  comparative  energy  analysis  of  all  the  complexes  has  permitted  us  to  order 
them  according  to  their  relative  stabilities.  The  energy  minimisation  calculations  point 
to  an  increased  stabilisation  of  the  minor  groove  complex  over  the  corresponding 
major  groove  complex  in  the  cases  of,  for  example,  d(TACG)2.,  d(CATA)2,  and 
(CATG)2-  Other  complexes,  such  as  d{CACG)2,  d (TAG A) 2  and  d(CACA)2  are 
predominantly  of  the  major  groove  type.  A  third  class  includes  d(GCAT)2,  d(ACAT)2, 
and  d(ACGT)2,  which  are  indifferent  energetically  to  either  groove. 


INTRODUCTION 

In  a  series  of  papers  we  have  attempted  so  far  to  understand  and  characterise 
the  complexes  formed  by  the  interaction  of  certain  mono  and  bis-intercalating 
substances  belonging  to  the  7H-pyridocarbazole  family  and  a  number  of  model 
oligonucleotides.  These  substances  show  a  large  antitumoral  activity  and  possess  a 
high  affinity  for  DNA  (1 ,2).  This  endeavor  originated  in  part  by  the  existence  of 
ligands  of  this  family  which  intercalate  to  DNA  but  which,  surprisingly,  do  not  possess 
any  antitumoral  activity  (2).  Our  studies,  theoretical  in  nature  (3-5),  have  been  backed, 
in  several  instances,  by  experimental  NMR  measurements  performed  on  the  same 
systems(6-8).  In  characterising  the  geometrical  and  energetical  properties  of  the 
studied  systems  we  have  succeeded  so  far  in  finding  agreement  between  theory  and 
experiment.  In  the  complexes  studied  previously,  the  oligocyclic  aromatic  moiety  of 
the  intercalating  ligand  was  usually  loged  in  a  dCpdG  step.  Non-covalent  bifunctional 
intercalation  in  the  case  of  ditercalinium  (denoted  here  as  202;  Fig.  1)  and  its 
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derivatives  proceeded  in  all  cases  via  the  major  groove  of  the  nucleotide,  i.e.,  the 
linking  chain  of  the  drug  occupied  the  major  groove  side  of  the  oligonucleotide. 


Figure  1  Chemical  structure  of  ditercaiinium 

We  were  curious,  then,  to  find  out  whether  intercalation  from  the  major  groove 
side  was  a  rule  for  this  family  of  ligands.  For  that  purpose,  the  cytosine  and  guanine 
bases  are  replaced  by  or  combined  with  thymine  and  adenine  in  a  rational  fashion, 
always  following  an  alternating  (purine-pyrimidine)2  or  (pyrimidine-purine)2 
sequence.  In  all  the  complexes  the  two  central  base  pairs  are  sandwiched  between 
the  chromophores  intercalated  at  the  terminal  sites.  The  resulting  complexes  are 
studied  by  energy  minimisation. 


METHODS 

The  theoretical  methodology  utilised  in  this  work  is  that  of  the  JUMNA  molecular 
mechanics  procedure.  This  technique  has  been  already  described  in  detail  (9-10). 


RESULTS  AND  DISCUSSION 

Table  I  shows  the  DNA-ligand  stabilisation  energies  for  the  optimised  complexes 
obtained  between  ditercaiinium  and  a  series  of  model  tetradeoxyribonucleotides  of 
alternating  pyrimidine-purine  or  purine-pyrimidine  pattern.  In  these  complexes  the 
linking  side  chain  of  the  ligand  is  located  either  in  their  major  groove  side  (M)  or  in  the 
minor  groove  side  (m).  The  methoxy  group  at  position  10  of  the  drug  chromophore  is 
always  oriented  towards  the  central  base  pairs. 
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TABLE  I 


Summary  of  the  energy  terms  for  the  optimal  bis-intercalating  complexes  of 


ditercalinium  with  selected  model  oligonucleotides  (values  in  kcal/moO 


Nucleotide 
in  complex1 

groove 

AE  DNA 

AE  LIG 

E  INTER 

Es 

3Es  2 

la)  CACA 

M 

59.4 

4.0 

-165.2 

-102.2 

-18.5 

b) 

m 

57.6 

0.5 

-141.8 

-83.7 

2a)  CACG 

M 

59.9 

0.4 

-168.4 

-108.1 

-22.0 

b) 

m 

57.6 

0.4 

-144.1 

-86.1 

3a)  CATA 

M 

50.7 

1.6 

-123.8 

-71.5 

41.8 

b) 

m 

61.4 

2.5 

-177.2 

-113.3 

4a)  CATG 

M 

53.9 

1.4 

-122.4 

-67.1 

46.2 

b) 

m 

63.4 

2.7 

-179.4 

-113.3 

5a)  CGCA 

M 

49.7 

4.1 

-151.7 

-97.9 

10.2 

b) 

m 

42.6 

1.2 

-152.1 

-108.3 

6a)  CGCG 

M 

67.8 

5.6 

-185.9 

-112.5 

-1.3 

b) 

m 

58.8 

2.1 

-172.1 

-111.2 

7a)  TACA 

M 

54.1 

3.3 

-160.0 

-102.6 

-15.0 

b) 

m 

58.7 

0.5 

-146.8 

-87.6 

8a)  TACG 

M 

54.5 

1.6 

-161.9 

-105.8 

10.1 

b) 

m 

59.9 

0.6 

-176.4 

-115.9 

9a)  TATA 

M 

39.5 

4.1 

-116.9 

-72.3 

35.8 

b) 

m 

55.4 

7.2 

-170.7 

-108.1 

10a)  TGCA 

M 

41.4 

3.3 

-142.0 

-97.3 

-20.8 

b) 

m 

43.8 

2.2 

-122.5 

-76.5 

11a)  ACAC 

M 

53.9 

1.0 

-139.2 

-84.3 

-10.6 

b) 

m 

54.1 

0.5 

-128.3 

-73.7 

12a)  ACAT 

M 

50.1 

1.3 

-123.5 

-72.1 

2.8 

b) 

m 

51.7 

0.6 

-127.2 

-74.9 

13a)  ACGC 

M 

57.9 

1.2 

-131.0 

-71.9 

-7.5 

b) 

m 

55.8 

-0.1 

-120.1 

-64.4 

14a)  ACGT 

M 

48.3 

1.5 

-118.5 

-68.7 

-2.8 

b) 

m 

41.5 

-2.7 

-104.7 

-65.9 

15a)  ATAC 

M 

47.0 

1.5 

-140.8 

-92.3 

-18.1 

b) 

m 

44.2 

2.0 

-120.4 

-74.2 

16a)  AT AT 

M 

40.9 

1.0 

-129.1 

-87.2 

-8.4 

b) 

m 

33.1 

3.9 

-115.8 

-78.8 
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17a)  GCAC 

M 

63.8 

0.9 

-146.8 

-82.1 

-13.5 

b) 

m 

60.0 

0.7 

-129.3 

-68.6 

18a)  GCAT 

M 

58.2 

1.4 

-135.8 

-76.2 

-1.5 

b) 

m 

60.8 

-1.2 

-134.3 

-74.7 

19a)  GCGC 

M 

70.9 

2.4 

-151.5 

-78.2 

-22.0 

b) 

m 

46.1 

-1.5 

-100.8 

-56.2 

20a)  GTAC 

M 

50.0 

1.0 

-148.0 

-97.0 

-23.7 

b) 

m 

46.3 

2.2 

-121.8 

-73.3 

1  Bisintercalation  takes  place  at  the  end  sites  BP1-BP2  and  BP3-BP4  (BP:  base  pair). 

2  9Es:  Energy  changes  of  major  groove  with  respect  to  minor  groove  complex. 

3  The  total  complex  energy  Es  =  AE  DNA  +  AE  i_!G  +  E  INTER,  whereAE  dna  is  the 
intramolecular  DNA  energy  change  with  respect  to  free  DNA;  .AE  |_IG  is  the 
intramolecular  ligand  energy  change  with  respect  to  free  ligand;  and.E  INTER  is  the 
intermolecular  DNA-ligand  energy. 

From  all  the  20  possible  pairs  of  alternating  (Py  Pu)2  or  (Py  Pu)2 
tetranucleotides,  ditercalinium  bisintercalates  from  the  major  groove  side  (M)  in  11  of 
them,  from  the  minor  groove  side  (m)  in  5  of  them,  and  indifferently  (within  3  kcal.  of 
3Es)  in  4  of  them.  Figure  2  shows  the  molecular  graphics  representations  of 
complexes  4b  and  20a. 


a)  202/CATG  (m) 


b)  202/GTAC  (M) 


Figure  2.  'Relaxed'  stereo  view  from  major  groove  side  of  a)  202  in  minor  groove  of 
CATG  ;  b)  202  in  major  groove  of  GTAC. 


- _ _ 


Table  II  shows  the  outcome  if  the  more  favorable  complex,  major  or  minor 
groove,  from  each  pair  of  complexes  in  Table  I  is  chosen  and  then  arranged  acording 
to  decreasing  stabilisation. 


TABLE  II 

Arrangement  in  order  of  decreasing  stabilisation  of  the  most  favored  -minor  groove  or 


major  groove-  of  each  pair  of  the  complexes  in  Table  I 


NUCLEOTIDE 

PREFERRED  GROOVE 

SEQUENCE  PATTERN 

TACG 

m 

(PyPu)2 

CATA 

m 

(PyPu)2 

CATG 

m 

(PyPu)2 

CGCG 

M,m 

(PyPu)2 

CGCA 

m 

(PyPu)2 

CACG 

M 

(PyPu)2 

TATA 

m 

(PyPu)2 

TACA 

M 

(PyPu)2 

CACA 

M 

(PyPu)2 

TGCA 

M 

(PyPu)2 

GTAC 

M 

(PuPy)2 

ATAC 

M 

(PuPy)2 

ATAT 

M 

(PuPy)2 

ACAC 

M 

(PuPy)2 

GCAC 

M 

(PuPy)2 

GCGC 

M 

(PuPy)2 

GOAT 

M,m 

(PuPy)2 

ACAT 

m,M 

(PuPy)2 

ACGC 

M 

(PuPy)2 

ACGT 

M,m 

(PuPy)2 

We  can  observe  from  this  table  II  that  the  most  favorable  complexes  are  those  i) 
following  the  (Py  Pu)2  alternating  sequence,  and  ii)  mainly  forming  minor  groove  bis- 
intercalation;  in  other  words,  the  minor  groove  complexes  are  in  general  more  stable 
than  the  major  groove  ones. 

Table  III  is  obtained  if  for  a  given  groove  the  complexes  are  arranged  by  energy 
differences  with  respect  to  complex  6a  (dEc). 
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TABLE  III 

Arrangement  of  the  complexes  of  Table  I  in  order  of  decreasing  destabilisation  with 
respect  to  202/dfCGCGtc> 


MAJOR  GROOVE  MINOR  GROOVE 


NUCLEOTIDE 

dEc 

NUCLEOTIDE 

dEc 

CGCG 

0.0 

TACG 

-3.4 

CACG 

4.4 

CATA 

-0.8 

TACG 

6.7 

CATG 

-0.8 

if 


'z/Z*  X/Ae-rxn  » ntru . 
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TACA 

•  9.9 

CGCG 

1.3 

CACA 

10.3 

CGCA- 

4.2 

CGCA 

14.6 

TATA 

4.4 

TGCA 

15.2 

TACA 

24.9 

GTAC 

15.5 

CACG 

26.4 

ATAC 

20.2 

CACA 

28.8 

ATAT 

25.3 

ATAT 

337 

ACAC 

28.2  , 

TGCA 

36.0 

GCAC 

30.4 

ACAT 

37.6 

GCGC 

34.3 

GCAT 

37.8 

GCAT 

36.3 

ATAC 

38.3 

TATA. 

40.2 

ACAC 

38.8 

ACAT 

40.4 

GTAC 

39.2 

ACGC 

40.6 

GCAC 

43.9 

CATA 

41 .0 

ACGT 

46.6 

ACGT 

43.8 

ACGC 

48.1 

CATG 

45.4 

GCGC 

56.3 

We  can  observe  from  this  table  ill  that  i)  for  either  mode  of  bis-intercalation, 
major  groove  or  minor  groove,  the  (PyPu)2  pattern  is  favored  over  the  (PuPy)2  one; 
ii)  the  tetramers  TACG,  CGCA  and  CGCG  form  stable  complexes  for  both  grooves; 
ACGT  and  ACGC  form  the  least  stable  complexes  for  both  grooves;  iii)  a  weak 
preference  for  CpA,  CpG,  TpA,  and  TpG  is  observed,  based  on  the  number  of  times  a 
given  step  appears  favored  within  1 5  kcal  of  dEc;  iv)  aside  from  (i)  no  other  trend  for  a 
coupled  dinudeotide  BP-|-p-BP2  to  BP3-P-BP4  step  preference  can  be  extracted 
neither  from  the  major  nor  from  the  minor  groove,  nor  when  comparing  both 
grooves.(BP:  base  pair). 

Ditercalinium  then  does  not  seem  to  show  any  particular  strong  sequence 
specificity.  On  the  other  hand,  depending  on  the  sequence,  it  possesses  a  certain 
groove  specificity.  This  is  what  we  call  sequence-dependent  groove  preference. 


CONCLUSION 

The  energy  minimisation  calculations  indicate: 

-  an  increased  stabilisation  of  the  minor  groove  complex  over  the  corresponding 
major  groove  complex  in  the  case  of  TACG,  CATA,  CATG,  CGCA,  and  TATA.  Other 
complexes  such  as  the  one  with  CACG  and  TACA  are  major  groove  complexes. 
Complexes  with  CGCG,  GCAT,  ACAT,  and  ACGT  are  indifferent  energetically  to  either 
groove. 

-  the  number  of  favorco  major  groove  complexes  is  larger  than  the  number  of 
minor  groove  ones. 

-  regardless  of  the  groove  to  which  ditercalinium  bisintercalates,  the  (PyPu)2 
pattern  is  always  more  favored  than  the  (PuPy)2  one. 

Finally,  the  prediction  of  bis-intercalation  from  the  minor  groove  side  in  the  case 
of  the  TATA  sequence  is  in  agreement  with  the  theory  of  the  molecular  electrostatic 
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potential  as  applied  to  DNA  (1 1 ,  and  references  therein).  The  mode  of  intercalation  of 
ditercalinium  with  a  TpApTpA  sequence  is  presently  being  investigated  in  this  same 
laboratory  by  and  31 P  NMR  techniques. 
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SUMMARY 

An  overview  is  given  of  recent  NMR  studies  on  the  structures  of  crambin,  lac 
repressor  headpiece  and  the  complex  formed  between  the  latter  and  lac  operator.  The 
solution  structure  of  crambin  was  determined  using  relaxation  matrix  calculations. 
Aspects  of  this  technique  are  discussed.  Docking  calculations  for  the  lac  headpiece  - 
operator  complex  are  reviewed.  The  structural  consequences  of  new  NMR  data 
obtained  for  lac  headpiece  are  discussed.  Advantages  and  applications  of  three- 
dimensional  NMR  spectroscopy  are  addressed. 


INTRODUCTION 

In  the  last  decade  NMR  spectroscopy  has  emerged  as  an  important  tool  for 
structure  determination  of  biomolecules  in  solution.  The  availability  of  high-field 
spectrometers  and  fast  computer  systems  made  it  possible  to  develop 
two-dimensional  (2D)  NMR  techniques  by  which  the  resonance  assignment  problem 
could  be  tackled  (ref.  1).  Strategies  to  assign  ’H  proton  resonances  in  protein  and 
DNA  spectra  were  developed  (ref.  2).  Finally  methods  have  become  available  to 
determine  the  structure  on  the  basis  of  distance  constraints  from  NMR,  such  as 
Distance  Geometry  (DG)  and  restrained  Molecular  Dynamics  (MD)  (refs.  2,  3). 

NMR  structures  are  primarily  based  on  proton-proton  distances  obtained  from  the 
nuclear  Overhauser  effect  (NOE).  The  origin  of  the  NOE  is  dipolar  cross-relaxation 
between  protons.  In  first  order  the  strength  of  the  NOE  is  inversely  proportional  to  the 
sixth  power  of  the  distance  between  two  protons.  By  calibrating  against  NOEs 
observed  for  protons  at  a  known  distance  relative  cross-poak  intensities  can  be 
translated  into  constraints  on  proton-proton  distances,  which  can  then  be  used  to 
determine  the  structure.  An  important  drawback  of  this  so-called  two-spin  approxi¬ 
mation  is  that  it  ignores  spin  diffusion  effects  i.e.  contributions  to  dipolar  relaxation  via 
other  neighbouring  protons.  This  approximation  has  been  shown  to  yield  distances 
that  may  be  in  error  by  as  much  as  20  to  40  percent  (ref.  4).  In  order  to  account  for  spin 
diffusion  it  is  net.  .ssary  to  solve  the  relaxation  equations  for  ail  spins  simultaneously. 
A  new  method  to  handle  this  problem  will  be  discussed. 

Furthermore  we  will  review  the  structural  studies  performed  with  NMR  of  the 
interactions  between  a  protein,  the  lac  repressor  headpiece,  and  its  DNA  binding  site. 
These  studies  are  a  good  example  of  the  power  of  NMR  methods  to  elucidate 
important  structural  details.  Not  only  was  the  orientation  of  the  protein  with  respect  to 


the  DNA  found  to  be  approximately  180°  different  from  that  of  other  DNA  binding 
proteins  (ref.  5),  but  from  the  NMR  structure  also  specific  protein-DNA  interactions 
were  predicted  which  have  been  found  recently  in  genetic  experiments  with  the 
complete  lac  repressor  (ref.  6). 

Finally  we  will  discuss  the  development  of  three-dimensional  (3D)  NMR 
experiments.  One  of  the  advantages  of  the  new  technique  is  the  increased  resolution, 
offering  the  possibility  of  analyzing  larger  molecules.  With  3D-techniques  it  also 
becomes  possible  to  directly  observe  spin  diffusion  pathways. 


RELAXATION  THEORY 

Dipolar  relaxation  in  a  multispin  system  is  approximately  described  by  a  Bloch  type 
equation  of  the  form 

A(U  =  A(0)  exp(-Rfm)  (1) 


where  A  is  the  matrix  of  normalized  cross-peak  intensities  observed  in  a  2D  NOE 
experiment  with  mixing  time  tm.  R  is  the  relaxation  matrix;  its  elements  R,y  can  be 
related  to  the  spectral  moments  of  the  dynamical  behaviour  of  the  position  vector 
connecting  spin  /  to  spin  j.  Assuming  that  the  molecule  is  rotating  isotropically  while 
remaining  rigid,  the  relaxation  matrix  elements  may  be  calculated  theoretically  from 
the  interproton  distances  and  the  rotation  correlation  time  tc  (refs.  1,  7,  8): 
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where  K=  0.1  y4  h2  {2jtp0)~2.  For  short  mixing  times  eqn.  (1)  may  be  expanded  in  a 
Taylor  series 

A(U  =  A(0)  { 1  -  R/m  +  R2fm2/2  (3). 

Keeping  only  the  first  two  terms  one  simply  has 

A ij  «  tm  tc  df*  (4). 

When  the  distance  for  a  particular  proton  pair  is  known,  the  corresponding  NOE 
intensity  Aca|  can  be  used  to  convert  intensities  to  distances,  using 

dij  =  tfcal  (Aca|/  Ay ) 1/6  (5). 

In  practice  it  is  difficult  to  obtain  accurate  NOE  values  at  mixing  times  that  are 
sufficiently  short  for  the  approximation  of  eqn.  (4)  to  be  valid.  At  longer  mixing  times  the 
NOEs  are  stronger  and  more  easily  observed,  but  the  two-spin  approximation  then 
breaks  down  and  one  observes  indirect  magnetization  transfer  or  “spin  diffusion": 
neighbouring  spins  start  to  contribute  to  the  buildup  of  NOE  intensity.  For  this  reason 
one  usually  translates  the  NOE  information  into  distance  ranges  rather  than  precise 
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distances  (refs.  2,  3). 

RELAXATION  MATRIX  CALCULATIONS 
Iterative  Relaxation  Matrix  Approach 

It  is  possible  to  treat  spin  diffusion  exactly  within  the  context  of  eqn.  (1).  By 
diagonalization  of  the  complete  relaxation  matrix  for  a  given  structural  model  eqn.  (1) 
may  be  solved  to  calculate  the  NOE  matrix  which  can  then  be  compared  with  the 
experimental  data  (ref.  8).  This  method  has  been  used  to  look  at  various  model 
structures  of  oligonucleotides  in  solution  (ref.  9).  Working  in  the  opposite  direction  it  is 
possible  to  “back  transform”  the  experimental  NOE  matrix  to  obtain  the  cross¬ 
relaxation  rates  and  thus  the  interproton  distances  corrected  for  spin  diffusion  (ref.  10). 

For  medium-sized  and  large  biomolecules  like  proteins  and  DNA  only  part  of  the 
NOE  matrix  can  be  obtained  experimentally,  and  the  direct  computation  of  R  from  eqn. 
(1)  is  therefore  not  possible.  As  first  shown  by  Boelens  et  al.  (refs.  4,  11)  this  problem 
can  be  circumvented  by  supplementing  the  experimental  information  with  theoretical 
data  calculated  from  a  model  structure.  The  so-called  Iterative  Relaxation  Matrix 
Approach  (IRMA)  starts  from  an  initial  structure,  e.g.  a  linear  chain.  Theoretical  NOE 
intensities  are  computed  and  combined  with  the  experimental  values  after  scaling  of 
the  latter  using  a  suitable  set  of  calibration  peaks,  i.e.  theoretical  values  are  replaced 
by  experimental  ones  for  each  proton  pair  for  which  a  NOE  has  been  observed.  Using 
eqn.  (1)  a  new  relaxation  matrix  is  computed  from  the  combined  NOE  matrix.  The 
off-diagonal  relaxation  matrix  elements  R,y  are  directly  translated  into  distance  bounds, 
taking  into  account  the  uncertainty  in  Ry-  as  reflected  in  its  variation  with  the  mixing  time 
(ref.  11).  Structure  calculations  (DG  and/or  MD)  are  then  performed  with  the  calculated 
distance  bounds  on  all  proton  pairs  for  which  NOEs  were  oberved.  The  resulting 
structure  is  used  as  starting  point  for  a  new  IRMA  cycle,  i.e.  calculation  of  theoretical 
NOEs  etc.  The  process  is  repeated  until  the  structure  and  the  distance  constraints 
converge. 

IRMA  was  first  applied  to  a  DNA  octamer  (ref.  11)  and  showed  good  convergence 
properties,  irrespective  of  the  starting  structure  (both  A-  and  B-DNA  models  were 
used).  The  accuracy  of  the  procedure  was  such  that  effects  of  local  mobility,  which  are 
not  included  in  the  present  treatment,  could  be  observed  in  the  pattern  of  residual 
constraint  violations. 

The  importance  of  correcting  for  spin  diffusion  is  shown  by  a  comparison  of 
distances  as  calculated  directly  from  theoretical  NOE  buildups  via  the  two-spin 
approximation,  eqns.  (4-5),  with  the  real  distances  (ref.  4).  The  relative  error  varies 
linearly  with  the  distance;  distances  not  corrected  for  spin  diffusion  have  a  narrower 
distribution  than  the  real  distances.  For  the  calibration  distance  used  in  this  example 
the  error  ranges  from  -20  to  +20  percent. 


'«sk<g 


664 

Results  for  crambin 

Recently  IRMA  has  also  been  applied  to  a  protein  (ref.  12).  The  solution  structure  of 
crambin  was  determined  using  more  than  600  experimental  NOEs,  each  measured  at 
6  different  mixing  times.  Effects  of  methyl  rotation  and  aromatic  ring  flips  were  taken 
into  account  (ref.  13).  Structure  calculations  employed  the  Iterative  Relaxation  Matrix 
Approach  in  combination  with  DG  and  MD  simulations.  The  initial  input  to  IRMA,  used 
to  calculate  the  first  set  of  constraints,  consisted  of  the  fully  extended  peptide  chain. 
The  final  structures  after  two  cycles  of  IRMA  show  a  backbone  r.m.s.  variation  of 
approximately  1  A,  The  difference  with  the  X-ray  structure  is  also  1  A  (ref.  12). 


initial  structure 
vs.  final  structure 


IRMA  second  cycle 
vs.  final  structure 


Fig.  la  (left)  and  1b  (right).  Diagrams  showing  convergence  of  IRMA  procedure.  For 
each  observed  NOE  a  point  is  plotted.  Horizontal  axis:  distance  in  final  structure. 
Vertical  axis:  distance  in  initial  structure  (left)  and  distance  from  IRMA  after  two  cycles 
(right). 


Fig.  1  illustrates  how  IRMA  succeeds  in  transforming  the  experimental  data  into 
distance  constraints.  Fig.  la  plots  distances  cfext  in  the  initial  extended  chain  structure 
against  distances  dso\  in  one  of  the  best  final  solution  structures:  only  data 
corresponding  to  observed  NOEs  are  shown.  The  r.m.s.  difference  ( (cfcxt  -  dso! ) 2  ),/2  is 
24.8  A.  The  distribution  of  distances  after  two  cycles  is  shown  in  Fig.  1b.  Due  to  the 
large  number  of  experimental  data  the  calculations  converge  rapidly.  Already  in  the 
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first  cycle,  before  any  structure  calculation,  the  r.m.s.  difference  between  IRMA 
distances  d(RMA  and  dso\  is  down  to  0.58  A.  This  value  is  virtually  unchanged  in  the 
next  cycles.  The  r.m.s.  in  terms  of  IRMA  upper  bounds  is  even  lower:  0.43  A. 

it  may  be  somewhat  surprising  that  the  final  structure  does  not  completely  satisfy 
the  IRMA  upper  bounds.  This  is  caused  by  the  impossibility  to  discriminate 
experimentally  between  protons  on  prochiral  centres,  such  as  p-p'  pairs  and  the  two 
methyl  groups  of  Val  and  of  Leu  residues.  In  the  plots  we  simply  calculated  the 
distance  to  the  average  positions  in  these  cases,  in  the  simulations,  however,  pseudo 
atoms  were  used  and  many  bounds  were  increased  with  1  A  or  more,  which  reduces 
the  precision  with  which  side  chains  can  be  positioned.  So  although  the  optimized 
structures  in  the  simulations  all  have  low  average  constraint  violations  of  less  than  0.1 
A,  which  seems  well  within  the  present  accuracy  of  the  IRMA  method,  the  theoretical 
buildups  predicted  from  these  structures  may  not  completely  match  the  experimental 
NOEs. 

Using  an  NMR  R-factor  as  measure  for  the  correspondence  between  calculated 
and  measured  NOEs  it  is  possible  to  discriminate  between  structures,  and  also 
pinpoint  regions  or  groups  in  the  molecule  for  which  the  correspondence  is  less  good 
(ref.  12).  In  the  same  spirit  it  appears  possible  to  obtain  stereospecific  assignments 
directly  from  a  comparison  between  theoretical  and  experimental  NOE  values  (data 
not  shown).  Together  with  methods  to  calculate  local  mobility  corrections  to  the  rigid 
molecule  model  these  procedures  provide  the  framework  for  extending  the 
applicability  of  IRMA  in  obtaining  high-quality  structures  with  NMR. 

LAC  REPRESSOR  HEADPIECE  -  OPERATOR  COMPLEX 
Main  structural  features 

One  of  the  largest  systems  studied  by  NMR  is  the  specific  protein-DNA  complex 
formed  between  the  headpiece,  i.e.  the  N-terminal  fragment,  of  lac  repressor  protein 
and  a  symmetric  22  base  pair  (bp)  lac  operator  fragment  (ref.  14).  The  biological 
function  of  lac  repressor  is  to  regulate  the  transcription  of  the  lac  genes  in  E.  coli. 
Expression  is  shut  off  by  binding  of  repressor  to  the  operator;  transcription  of  the 
lactose  enzymes  can  be  induced  again  by  allolactose  which  causes  the  protein  to  lose 
its  specific  binding  capacity  and  thus  disrupts  the  complex.  The  system  has  been  very 
well  characterized,  both  biochemically  and  genetically  (ref.  15).  The  natural  repressor 
binds  as  a  tetramer  to  DNA.  The  headpieces  of  two  subunits  appear  to  be  involved  in 
binding  to  the  operator,  which  indeed  has  an  approximate  two-fold  symmetry. 
Proteolytic  cleavage  yields  headpiece  fragments  consisting  of  51,  56  or  59  residues, 
depending  on  the  proteolytic  enzyme.  These  fragments  retain  their  original  3D 
structure  and  their  ability  to  bind  specifically  to  a  half-operator. 

The  complete  lac  repressor,  with  a  molecular  weight  of  154,000,  is  too  large  to 
study  in  detail  with  present  day  NMR  techniques.  As  model  systems  we  therefore 
employed  the  complexes  formed  between  different  headpieces  (51,  56,  59)  and 
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synthetic  operators  (11  or  14  bp)  corresponding  to  the  left,  i.e.  the  more  strongly 
binding,  half  of  the  wild  type  operator  (ref.  15).  Recently  we  also  analysed  the  NOE 
spectra  of  the  complex  formed  between  the  22  bp  symmetric  operator  fragment  and 
two  headpieces,  which  has  a  molecular  weight  of  approximately  25,000  (ref.  14). 

First  the  structure  of  headpiece  (HP)  itself  was  determined,  using  a  combination  of 
model  building  and  restrained  MD  simulations  (refs.  16,  17).  The  molecule  was  found 
to  consist  of  three  a-helices  packed  around  a  hydrophobic  core.  The  DNA  binding 
domain  appeared  to  fold  in  a  helix-turn-helix  motif,  confirming  suggestions  based 
upon  sequence  homology  between  the  DNA  binding  domains  of  lac  repressor  and  a 
number  of  other  proteins  for  which  crystal  structures  had  been  determined,  such  as 
cro,  X  and  trp  repressors,  and  CAP  (ref.  15, 18). 

From  2D  NMR  studies  on  the  lac  headpiece  complex,  first  performed  with  HP56  - 
14bp  (refs.  5,  19)  and  later  with  HP56  -11  bp  and  HP56  -  22bp  combinations  (refs.  14, 
20),  it  readily  became  apparent  that  both  protein  and  DNA  essentially  retain  their 
free-state  conformations.  The  packing  of  the  helices  stays  intact,  and  the  operator 
remains  in  its  B-DNA  type  structure,  although  some  local  deformations  can  not  be 
excluded.  No  essential  differences  are  observed  between  the  different  complexes, 
apart  from  an  apparent  difference  in  binding  strength. 

Despite  the  complexity  of  the  2D  spectra  due  to  the  size  of  the  system,  which  has  a 
molecular  weight  of  14,000,  a  number  of  NOEs  between  protein  and  DNA  could  be 
identified.  These  involve  only  contacts  between  apolar  groups,  since  the  H20  spectra 
in  which  exchangeable  protons  can  be  observed  are  difficult  to  analyse  in  detail. 
Nevertheless  the  pattern  of  NOEs  is  consistent  with  the  data  obtained  in  protection 
experiments  and  in  genetic  studies:  the  second  or  "recognition"  helix,  extending  from 
residue  17  to  25,  binds  in  the  major  groove  of  DNA. 

This  mode  of  interaction  corresponds  to  what  had  been  inferred  from  genetic  data 
and  model  building  studies,  both  for  lac  headpiece  and  for  other  repressors  (ref.  18). 
However,  the  NMR  data  show  conclusively  that  the  orientation  of  the  lac  headpiece 
recognition  helix  with  respect  to  the  DNA  axis  is  the  opposite  of  that  in  the  model 
proposed  on  the  basis  of  the  supposed  analogy  to  the  models  of  cro  and  CAP 
complexes. 

As  is  clear  from  Fig.  2  the  cluster  of  NOE  contacts  seen  from  Thr-5,  Leu-6  and  Tyr-7 
in  the  first  helix  to  base  pairs  9  and  10,  and  the  NOE  cluster  from  His-29  in  the  loop 
between  the  second  and  the  third  helix  to  base  pairs  2  and  3,  uniquely  determine  the 
orientation  of  the  protein.  Further  evidence  comes  from  an  NMR  study  of  the  complex 
between  the  22  bp  symmetric  operator  and  headpiece  56  (ref.  14).  This  2:1  complex 
was  found  to  be  completely  symmetric,  which  means  that  the  two  headpieces  bind 
simultaneously  in  identical  fashion.  The  NOE  pattern  showed  that  the  binding  mode  is 
the  same  as  that  observed  in  complexes  of  one  headpiece  with  a  half  operator. 


Fig. 2.  Schematic  drawing  of  the  lac  headpiece  operator  complex.  The  black  dots 
indicate  the  phosphates  where  ethylation  interferes  with  lac  repressor  binding  (ref.  37). 


These  results  support  the  idea  that  the  complete  repressor  binds  with  the  same 
"reversed"  orientation  as  the  monomeric  model  systems.  Recent  genetic  experiments 
have  provided  evidence  that  this  is  indeed  the  case.  Lehming  et  al.  (ref.  6)  investigated 
the  binding  between  operators  and  repressors  which  had  mutations  at  one  or  two 
positions.  By  correlating  replacements  with  affinity  changes  they  inferred  interactions 
between  the  first  two  residues  of  the  recognition  helix  and  base  pairs  7  and  9,  and 
between  the  sixth  residue  (Arg-22  in  wild  type)  and  base  pair  5.  Especially  the  latter 
result  is  strong  evidence  that  the  orientation  observed  with  NMR  is  the  same  as  that  of 
the  complete  repressor. 

By  now  a  number  of  crystal  structures  of  repressor-operator  complexes  have  been 
determined  (refs.  21-23).  In  all  these  cases  the  orientation  agrees  with  the  earlier 
models  proposed  for  X,  cro  etc.  Therefore  two  modes  of  binding  of  helix-turn-helix 
proteins  to  DNA  appear  to  exist,  which  can  be  designated  as  the  lac  and  cro  types. 
From  homology  arguments  and  the  genetic  data  obtained  by  Lehming  et  al.  (ref.  24)  it 
is  likely  that  at  least  gal  and  deo  repressors  and  maybe  also  CAP  (ref.  6)  belong  to  the 
lac  type  binding  mode. 


fi 


668 


Detailed  simulations 

On  the  basis  of  the  NMR  data  we  attempted  to  model  the  structure  of  the  lac 
headpiece-operator  complex  in  more  detail.  The  uncertainties  here  are  much  larger 
than  for  headpiece  itself.  Firstly  the  number  of  NOEs  observed  between  protein  and 
DNA  is  quite  small  (24  at  the  time  of  these  simulations).  Secondly,  these  NOEs  were 
determined  at  relatively  long  mixing  times  in  order  to  enhance  their  appearance. 
Therefore  spin  diffusion  contributes  to  the  intensity,  and  relatively  large  upper  distance 
bounds  had  to  be  used  (but  this  can  be  remedied  by  using  IRMA).  We  used  limits  of  4 
and  6  A  depending  on  the  strength  of  the  NOE. 

An  important  problem  in  such  studies  of  an  intermolecular  complex  is  how  an 
optimal  alignment  of  the  two  subunits  can  be  attained.  Visual  inspection  using  a 
molecular  graphics  system  obviously  has  its  limitations;  on  the  other  hand  restrained 
MD  simulations  are  liable  to  end  up  in  the  wrong  minimum,  since  the  molecules  do  not 
easily  disengage  on  a  picosecond  time  scale  once  they  are  locked  into  a  particular 
configuration.  We  obtained  good  results  with  another  kind  of  optimization  procedure 
called  the  Ellipsoid  Algorithm.  Essentially  this  is  a  minimization  procedure  which 
avoids  local  minima  by  making  large,  discontinuous  steps  of  gradually  decreasing 
size.  An  extra  advantage  stems  from  handling  only  one  constraint  per  step  instead  of 
the  total  constraint  function. 

Using  Billeters  implementation  (ref.  25)  we  attempted  the  docking  of  rigid 
headpiece  and  operator  fragments  (ref.  26).  Initial  co-ordinates  were  taken  from  a 
model  built  structure  of  the  complex,  which  had  been  optimized  by  a  very  short  (5  ps) 
MD  run  (ref.  27).  The  total  violation  of  protein-DNA  constraints  was  less  than  1  A  after 
each  of  the  three  dockings  that  were  performed.  By  comparison:  the  total  violation  was 
3.6  A  in  the  starting  structure  and  13.0  A  in  the  model  built  one.  In  the  docking 
procedure  the  headpiece  moved  along  the  DNA  by  almost  one  base  pair, 
corresponding  to  an  r.m.s.  change  in  Ca  positions  of  5  to  7  A. 

While  the  ellipsoid  algorithm  is  good  in  finding  global  minima  of  a  relatively  smooth 
potential,  such  as  the  total  protein-DNA  constraint  function,  it  is  less  good  at  optimizing 
local  details  such  as  the  exact  position  of  side  chains,  which  involve  rapidly  changing 
repulsive  potentials.  Further  optimizations  were  therefore  performed  with  restrained 
EM  and  MD  using  the  GROMOS  package  (ref.  28).  As  before  the  DNA  was  kept  in  the 
B-DNA  form,  and  the  constraints  as  derived  for  the  free  headpiece  were  applied  to  the 
protein. 

In  these  runs  we  also  put  hydrogen  bond  constraints  between  G7  and  Gln-18  and 
between  G5  and  Arg-22.  The  contact  between  G7  and  Gln-18  was  assigned  on  the 
basis  of  mutant  studies  by  Ebright  (ref.  29)  and  by  Lehming  et  al.  (ref.  24).  The  contact 
between  G5  and  Arg-22  was  postulated  by  us  on  the  basis  of  the  NMR  data  and  model 
building  (ref.  5)  and  is  supported  by  recent  mutant  studies  (ref.  6).  Because  it  was  not 
known  a  priori  which  nitrogen  acts  as  hydrogen  bond  donor  in  Arg  and  whether  N  or  O 
is  the  acceptor  atom  in  Guanine,  the  constraints  had  to  be  fairly  liberal:  6.0  A  for  Gin 
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and  7.3  A  for  Arg. 

Several  MD  simulations  of  10  or  20  ps  were  performed  starting  from  different 
configurations  (ref.  26).  The  total  constraint  violations  always  dropped  rapidly  to  a  few 
0.1  A.  Resulting  structures  had  Ca  r.m.s.  differences  up  to  3.5  A.  One  of  the  best 
structures  is  shown  in  Fig.  3.  Surprisingly  few  hydrogen  bonds  between  protein  and 
DNA  were  found.  The  most  persistent  ones  are  those  between  the  amide  of  Leu-6  and 
the  phosphate  of  C9,  and  between  N£  of  Gln-18  and  (mainly)  N7  of  G7.  The  latter  result 
shows  that  one  of  the  two  postulated  contacts  is  perfectly  compatible  with  the  NMR 
data. 

The  Arg-22  side  chain  on  the  contrary  almost  never  formed  a  hydrogen  bond  with 
G5.  Instead  contacts  were  sometimes  observed  with  T3  and  T4.  Since  the  non-NMR 
constraints  applied  to  test  the  formation  of  a  hydrogen  bond  were  liberal  we  do  not 
think  that  the  presence  of  these  extra  constraints  has  seriously  influenced  the 
calculated  structure.  The  only  other  group  in  the  recognition  helix  that  formed 
hydrogen  bonds  to  bases  of  DNA  was  Ser-21,  which  was  found  to  interact 
alternatingly  with  three  different  bases,  G7,  C7  and  T8. 


Fig.  3.  Stereoview  of  a  structure  of  the  lac  headpiece  -  operator  complex,  obtained  by 
ellipsoid  docking,  followed  by  restrained  MD  and  EM  (ref.  26). 


A  similar  picture  has  come  forward  from  simulations  of  the  lac  headpiece  -  operator 
complex  in  water,  performed  by  de  Vlieg  et  at.  (ref.  30).  In  a  trajectory  of  125  ps,  partly 
simulated  without  the  NMR  constraints,  few  hydrogen  bonds  were  found  between  the 
protein  side  chains  and  DNA  bases.  The  ones  that  did  occur  were  mediated  by  a  water 
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molecule.  On  the  other  hand  various  long  lived  non-polar  contacts  were  seen.  This 
raises  the  question  what  (if  any)  the  contribution  of  hydrogen  bond  interactions  to 
specific  recognition  can  be. 

New__data 

In  the  last  year  the  identification  of  NOE  cross  peaks  in  spectra  of  the  complex  has 
been  extended,  partly  by  measuring  at  a  spectrometer  frequency  of  600  instead  of  500 
MHz,  and  partly  by  analysing  spectra  of  the  HP56  -  1 1  bp  variant,  which  show  less 
overlap  than  spectra  of  the  other  systems.  The  total  number  of  constraints  for  the 
protein  has  increased  significantly  from  137  to  207.  Subsequently  many  NOEs  could 
be  traced  back  in  the  spectra  of  the  free  headpiece;  however  the  constraint  sets  for  the 
free  and  the  bound  form  are  no  longer  identical. 

We  performed  a  preliminary  study  of  the  structural  consequences  of  the  new 
constraint  data.  Starting  from  a  previously  obtained  MD  configuration  80  ps  of 
restrained  MD  were  carried  out,  both  for  the  “free”  and  the  "bound”  states,  of  which  the 
last  40  ps  were  used  for  analysis.  Averaged  and  subsequently  minimized  structures 
appear  to  differ  significantly  from  the  previous  model.  Similarly  a  difference  is  found 
between  the  free  and  bound  structures,  which  is  localized  around  His-29  in  the  loop 
connecting  helix  2  to  helix  3. 

The  NOE  analysis  uncovered  also  a  number  of  new  protein-DNA  contacts.  In  total 
40  of  these  have  now  been  identified,  while  24  were  used  in  the  simulations  discussed 
above.  New  contacts  involve  His-29,  Gin-18  as  well  as  Tyr-7  and  Tyr-17.  In  the  ’HgO 
spectrum  a  NOE  cross  peak  has  been  identified  as  a  contact  between  the  side  chain 
amide  of  Gln-18  and  the  H5  proton  of  base  C7.  This  is  in  agreement  with  genetic  data 
(refs.  24,  29)  but  indicates  that  the  Gin  side  chain  is  interacting  with  C7  rather  than  G7. 
Preliminary  modelling  results  show  that  this  contact  can  easily  be  accomodated.  New 
structure  calculations  are  presently  being  undertaken,  and  we  expect  that  the  new 
NOE  data,  in  combination  with  the  use  of  IRMA  (ref.  11),  will  increase  the  precision  of 
the  structure.  An  important  piece  of  information  may  come  from  the  NOEs  observed  for 
Arg-22:  contacts  to  the  protein  backbone  now  appear  to  constrain  the  side  chain  to  a 
more  extended  conformation  than  before,  which  brings  the  guanidylo  group  in  a  much 
closer  contact  with  the  G5  base.  In  what  way  this  will  affect  the  hydrogen  bonding 
pattern  remains  to  be  seen. 

THREE-DIMENSIONAL  NMR  SPECTROSCOPY 

2D  NMR  has  allowed  the  detailed  analysis  of  NOE  and  J-interactions  in  molecules 
up  to  a  molecular  weight  of  approximately  15,000  (ref.  2).  Beyond  this  limit  the 
increase  in  line  width  and  the  number  of  resonances  cause  too  much  overlap,  even  in 
2D  spectra.  Specific  isotope  labeling  or  comparison  of  spectra  obtained  in  different 
experimental  conditions  may  be  used  to  overcome  this  problem.  A  much  more  general 
method  for  studying  complex  systems  is  the  recently  developed  3D  NMR  technique 
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(refs..  31-33).  The  addition  of  a  third  frequency  domain  helps  in  two  ways.  First,  the 
spectra  are  less.crowded  since  the  interactions,  in  the  form  of  cross  peaks,  are  spread 
out  over  a  larger  area.  Secondly,  assignment  of  interactions  that  overlap  in  2D  spectra 
becomes  possible  in  3D  when  the  additional  coherence  transfer  step  links  them  to 
different-:uniquely  defined  nuclei. 

Until  now  several  homonuclear  and  heteronuclear  3D  NMR  techniques  have  been 
proposed,  such  as  homonuclear  3D  NOE-HOHAHA  (refs.  32,.  33),  3D  NOE-NOE  (ref. 
34)  and  heteronuclear  3D  NOE-HMQG  (refs,  35,  36).  The  advantages  of  the  hetero¬ 
nuclear  3D  HMQG  experiments  are  the  relative  simplicity  of  the  spectral  analysis  and 
the  fact  that  they  do  not  rely  on  the  possibly  weak  J-coupling.  However,  they  do  require 
that  the  protein  material  is  isotopically  enriched  in  13C  or  15N  which  can  be 
problematic.  Here  we  will  focus  on  the  non-selective  3D  NOE-HOHAHA  experiment.  In 
this  experiment  the  increase  in  resolution  is  obtained  by  correlating  resonances  of  ’H 
nuclei  -  the  most  abundant  nuclei  in  biomolecules  -  in  three  frequency  domains. 
Furthermore  the  technique  combines  the  measurement  of  J-  and  NOE-interactions  in 
one  experiment,  and  simultaneously  registers  a  unique  double  magnetization  transfer. 

The  3D  NOE-HOHAHA  experiment  can  be  described  as  the  combination  of  the  2D 
NOE  and  2D  HOHAHA  experiments,  as  shown  in  Fig.  4a.  The  FID  in  the  f3  time  domain 
is  recorded  as  a  function  of  two  variable  times  f,  and  f2,  which  are  independently 
incremented.  After  Fourier  transformation  in  the  three  dimensions  a  3D  NMR  spectrum 
is  obtained  with  three  frequency  axes.  A  cross  peak  in  the  3D  spectrum  arises  when 
magnetization  of  one  proton  is  transferred  in  the  first  (NOE)  mixing  period  to  a  second 
proton,  and  then  in  the  second  (HOHAHA)  mixing  period  to  a  third  proton. 
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Fig.  4.  A:  Pulse  sequence  for  a  3D  NOE-HOHAHA  experiment. 

B:  Cross-diagonal  planes  in  a  3D  NOE-HOHAHA  spectrum. 


Schematically  the  3D  spectrum  can  be  presented  in  a  cube  with  axes  co1t  co2  and 
co3.  In  such  a  3D  spectrum  a  body  diagonal  (w1=co2=co3)  can  be  identified,  which 
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contains  magnetization  that  was  -not  transferred  during  any  of  the  mixing  periods. 
Additionally  intensity  will  accumulate  in  three  cross-diagonal  planes,  cf.  Fig.  4b.  In  the 
3D  NOE-HOHAHA  experiment  the  plane  co2=co3  (NOE  plane)  will  contain 
magnetization  transferred  only  during  the  first  (NOE)  mixing  period.  Similarly  the  plane 
co,=g>2  (HOHAHA  plane)  will  correspond  to  the  second  coherence  transfer  step.  Thus 
the  3D  NOE-HOHAHA  experiment  contains  the  information  present  in  the  traditional 
2D  NOE  and  HOHAHA  as  a  subset.  Finally  there  is  the  unique  co-^cos  plane 
(back-transfer  plane),  which  contains  magnetization  transferred  during  the  NOE  mixing 
period  from  proton  a  to  b  and  then  back  to  proton  a  during  the  HOHAHA  mixing  period. 


Fig.  5.  The  use  of  oo3  cross  sections  at 
amide  frequencies  to  obtain  sequential 
assignments  from  3D  NOE-HOHAHA 
spectra.  The  lines  indicated  with  N,  H  and 
B  indicate  the  intersection  with  the  NOE, 
HOHAHA  and  back-transfer  planes, 
respectively.  The  solid  arrows  show  the 
inter-residue  NaN  connectivity,  the 
broken  arrows  the  intra-residue  NaN 
connectivity. 


Fig.  6.  intra-  and  inter-residue  connectivities  along  the  backbone  of  a  protein. 


For  the  analysis  of  the  3D  spectra  we  can  take  cross  sections  perpendicular  to  one 
of  the  axes,  e.g.  the  ©3  axis.  The  three  diagonal  planes  will  intersect  this  cross  section 
at  three  lines  indicated  in  Fig.  5  by  N  (NOE),  H  (HOHAHA)  and  B  (back  transfer).  All 
cross  peaks  outside  these  three  lines  are  due  to  a  double  magnetization  transfer 
involving  three  different  frequencies. 

We  will  now  explain  how  the  information  in  such  a  NOE-HOHAHA  spectrum  can  be 
used  for  the  assignment  of  proton  resonances  in  proteins,  which  is  a  prerequisite  for 
NMR  based  structure  determination.  As  outlined  by  Wuthrich  (ref.  2)  proton 
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resonances  in  proteins  can  be  connected  sequentially  along  the  backbone  chain  by 
combining  J-coupling  and  NOE  information.  Thus  for  each  amino  acid  a  series  of 
intra-residue  J-interactions  and  intra-  and  inter-residue  NOE  interactions  can  be 
identified,  cf.  Fig.  6.  Fig.  5  shows  schematically  where  the  intra-residue  and  the 
sequential  Cn<xn[NOE,J]  connectivities  appear  on  a  (o3  cross  section  of  a 
NOE-HOHAHA  spectrum. 
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Fig.  7.  a>3  cross  sections  of  the  3D 
NOE-HOHAHA  spectrum  of 
parvalbumin  in  H20  at  the  NH 
frequencies  of  Ala-54  to  Gly-56. 


In  fact  many  of  such  connectivities  can  be  identified  for  the  N,  a  and  p  protons 
along  the  backbone  of  a  protein,  the  intensity  of  which  depends  on  secondary 
conformation.  An  example  of  a  sequential  assignment  is  given  in  Fig.  7.  It  shows  a 
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stretch  of  NaN  connectivities  in  the  NOE-HOHAHA  spectrum  of  pike  parvalbumin  in 
H20,. extending  from  A!a-54  to  Gly-56. 

(t  is  clear  that  the  combination  of  NOE  transfer  and  HOHAHA  mixing  will  be  very 
useful  for  assignment  procedures  and  identification  of  secondary  structure  elements  in 
protein  NMR  spectra.  Obviously  the  increased  resolution  in  the  3D  cube  compared  to 
the  2D  plane  is  also  of  great  help  for  identifying  more  NOE  interactions,  which  can 
lead  to  a  higher  precision  in  structure  determination  or  to  solution  of  more  complex 
structures.  Recently  we  proposed  the  3D  NOE-NOE  experiment  as  a  method  to  study 
complex  NOE  networks  {ref.  34).  Apart  from  the  fact  that  the  method  can  be  used  to 
analyze  spin  diffusion  networks  and  determine  the  cross-relaxation  matrix  with  higher 
precision,  the  technique  seems  particularly  useful  for  studying  larger  proteins.  With 
higher  molecular  weights  the  HOHAHA  mixing  decreases  in  efficiency  due  to  the  short 
r1p  relaxation  time.  The  NOE  transfer,  on  the  contrary,  becomes  more  efficient  for 
larger  proteins.  The  analysis  of  such  spectra  with  many  cross  peaks  is  still  complicated 
and  further  automation  is  highly  desirable. 
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POLTEV  -  What  can  you  tell  about  distorsion  ot  DNA  helix  structure  as  a  result  of  the 
interaction  with  the  repressor  ?  Is  it  possible  to  characterize  these  distorsions 
quantitatively  by  atom  group  displacements,  by  changes  of  base  positions,  torsion 
angles,  etc.  ? 

RULLMANN  -  In  most  of  our  simulations  we  kept  the  DNA  rigid  in  its  energy  minimized 
B-DNA  form.  We  did  this  for  two  reasons  :  firstly,  the  calculations  were  done  in  vacuum 
with  neutralized  phosphates  and  residues.  Structural  predictions  for  DNA  are  then 
rather  speculative.  Secondly,  at  the  time  we  did  not  have  accurate  NMR  data  on  the 
operator,  except  that  it  has  the  same  NOE  pattern  as  B-DNA.  Using  IRMA  we  hope  to 
be  able  to  derive  more  quantitative  information  about  the  conformation  adopted  by  the 
operator.  Results  obtained  for  free  DNA  fragments  demonstrate  that  this  is  possible. 


MILON  -  Have  you  ever  used  the  IRMA  technique  in  order  to  check  the  hypothesis  of 
molecule  rigidity,  and  more  generally,  could  you  comment  about  your  approach  of 
internal  mobility. 
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RULLMANN  -  In  the  calculations  on  the  DNA  octamer  we  have  seen  that  the  largest 
constraint  violations  appeared  on  lower  bounds  involving  sugar  ring  atoms.  This 
correlates  well  with  the  assumption  that  the  sugar  rings  are  the  most  mobile  parts  of 
DNA.  Preliminary  results  obtained  by  using  mobility  data  taken  from  MD  simulations  to 
modify  the  correlation  time  locally,  indeed  show  an  improvement  of  the  constraints.  For 
crambin  the  mobility  effects  are  (partly)  obscured  by  the  imprecision  due  to  lack  of 
stereospecific  assignments.  However,  calculated  NOEs  tend  to  differ  most  from 
experimental  values  for  residues,  in  the  more  flexible  regions  of  the  molecule.  Local 
mobility  effects  are  probably  important  here. 


LAPLANTE  -  To  what  extent  have  3D  NMR  techniques  been  practically  useful  for 
example  for  assignment  purposes  ? 

RULLMANN  -  I  must  apologize  for  not  being  able  to  answer  the  question  in  detail, 
since  I  am  not  involved  in  the  NMR  work  itself.  At  the  moment  the  research  is  still  very 
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RESTRAINED  MOLECULAR'  MECHANICS  OF  ENKEPHALIN  USING 
DISTANCES  DERIVED  FROM NMR  (TRANSFERRED  NOE) 
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Centre  de  Neurochimie,  67084  Strasbourg,  France. 

SUMMARY" 

The,  membrane  bound  conformation,  of  (D-Ala^)-Leu5  enkephalin  has  been 
studied  by  the  transferred  NOE  technique  in  the  presence,  of  perdeuterated 
phospholipids.  Seventy.;-two  interproton  distances  have  been  determined  and  used 
as  constraints  in  a  molecular  mechanics  .energy  minimization  using  the  TRIPOS 
force  field.  The  refined  model  is  in  agreement  with  all  the  NMR  data  and  is 
characterised  by  a  type-II’  (3-tum  around  the  last  four  residues  and  a  y-turn 
centered  around  DrAla.  The  .membrane  bound  conformation  is  closely  related  to 
the  activity  pf  the  peptide. 

INTRODUCTION 

Leu?-enkephalin  (Tyr-Gly-Gly-Phe-Leu)  and  Met5-enkephalin  (Tyr-Gly-Gly- 
Phe-Met)  areendogenous  peptides  with  morphine  like  activity  (ref.  1).  Since  their 
discovery,  there  have  been  extensive  studies  in  order  to  determine  their  active 
conformation,  because  they  are  considered  to  bind  to  the  same  class  of  receptors  as 
morphine,  which  has  a  rigid  structure.  Several  studies  in  dimethyl  sulfoxide 
solution  have  led  to  the  conclusion  that  enkephalin  existed  in  a  p-turn  structure 
involving  the  C-terminal  four  residues  (refs.  2,3).  Subsequent  studies  indicated 
that  enkephalin  was  probably  in  equilibrium  between  folded  and  extended 
conformations  (refs.  4,5).  It  was  later  shown  that  in  aqueous  solution  the  peptide  is 
flexible  and  primarily  takes  extended  conformations  (refs.  4,6).  Thus,  the 
conformational  analysis  of  peptides  in  solution  gave  little  clue  to  the  elucidation  of 
conformation-activity  relationship  of  enkephalins. 

On  the  other  hand,  the  importance  of  the  interaction  with  phospholipid 
membranes  is  now  well  recognized.  Thus  for  many  peptides,  the  biological 
activities  have  been  found  to  be  related  to  the  affinity  to  phospholipid  membranes 
and-  the  conformation  in  the  membrane-bound  state,  rather  than  to  the 
conformation  in  aqueous  solution  (ref.  7).  In  the  particular  case  of  enkephalins, 
anionic  phospholipids  are  required  for  the  binding  to  bilayers  (ref.  8)  but  not  for 
the  binding  to  micelles  (ref.  9).  A  bilayer  membrane  made  of  neutral  and  anionic 
phospholipids  is  also  a  more  faithful  model  of  cell  membranes  than  are  micelles. 
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However,  the  proton  resonances  of  the  membrane-bound  peptide  (total 
molecular  weight  higher  than  107)  are  too  broad  to  be  analysed  in  detail  by 
standard  techniques.  In  the  presence  of  phospholipid  bilayers,  the  peptide 
exchanges  rapidly  between  a  membrane-bound  state  and_  a  free  state.  The 
magnetization  transfer  is  much  more  efficient  in  the  membrane-bound  state  than  in 
the  free  state  and  results  in  negative  NOEs  (because  the  peptide-vesicle  system  is  in 
very  slow  motion).  Therefore,  the  transferred  NOE  technique  .(TRNOE,  ref.  10- 
12)  allows  the  conformation  determination  of  peptides  in  the  membrane-bound 
state  even  though  the  spectrum  of  the  bound  peptide  itself  cannot  be  observed  due 
to  the  slow  motion. 

We  have  succeeded  recently  in  the  elucidation  of  the  specific  conformation  of 
[D-Ala2]-Leu5  enkephalin,  a  highly  potent  analog  of  enkephalin,  in  the  membrane- 
bound  state  (ref.  13).  We  chose  [D-Ala2]-Leu5  enkephalin  because  this  peptide 
analog  is  10  times  more  active  than  Leu5  enkephalin  (ref.  14)  and  because  the 
alanine  methyl  group  was  expected  to  reduce  the  conformational  flexibility  of  the 
molecule  and  allowed  a  more  clear-cut  analysis  of  the  TRNOE  data  (the  presence 
of  two  glycine  residues  in  Leu5-enkephalin  makes  it  difficult  to  interpret  TRNOE 
data  unambiguously). 

METHODS 

In  the  presence  of  vesicles  made  of  perdeuterated  phospholipids 
(phosphatidylcholine-phosphatidylserine  in  a  1-1  molar  ratio)  in  '1120  solution, 

phase  sensitive  NOESY  spectra  have  been  observed  at  mixing  times  ranging  from 
50  to  300  ms.  ID  NOE  difference  experiments  were  also  performed  using  the  1-1 
hard  pulse  technique  for  water  suppression  and  a  mixing  time  of  300  ms.  Having 
performed  the  experiments  in  H2O,  complete  sets  of  transferred  nuclear 

Overhauser  effect  data  have  been  obtained  for  pairs  of  N-H  (all  NH  protons  except 
the  Tyr  a-NH3+  protons  which  exchange  rapidly  with  H2O  protons),  C-a  (all  a 
protons  except  Phe  C-a  proton  whose  resonance  was  masked  by  that  of  H2O),  C-p 
and  side-chain  C-H  protons.  These  NOEs  were  convertedjnto  distances  assuming 
that  there  was  one  rigid  conformation  (NOE  proportional  to  d-^T  aricTTising-  the 
tyrosine  aromatic  H(2)-H(3)  distance  as  internal  reference. 

Using  the  SYBYL  software  (Tripos)  implemented  on  a  microVax  II  (DEC) 
and  a  graphic  station  PS390  (Evans  &  Sutherland),  a  starting  conformation  was 
built.  Values  close  to  the  model  proposed  in  ref.  13  were  given  to  the  dihedral 
angles.  The  distances  derived  from  NMR  were  introduced  as  constraints  of  the 
type  E  =  k*(d-d0)2,  were  d  is  the  interproton  distance,  d0  is  the  distance  used  as  a 
constraint  and  k  is  the  force  constant  (k  =  200  cal. A-2  ,  a  value  comparable  to  the 
bond  strength  constant  used  in  the  Tripos  force  field).  The  energy  of  the 
constrained  molecule  was  minimized  using  the  Tripos  force  field  maximin  2  and 
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compared  to  the  minimized  energy  of  the  starting  conformation.  The  final 
conformation  energy- was  minimized  again  after  removal  of  the  constraints  in 
order  to  assess  the  strain  existing  in  the  final  molecule. 

RESULTS  AND  DISCUSSION 

After  conversion  of  the  NOE  data  into  distances,  a  set  of  17  interproton 
distances  and  of  53  lower  limits  of  interproton  distances  was  available  for 
restrained  molecular  mechanics  energy  minimization  (table  1). 

TABLE  1 

Distances  used  as  constraint  in  the  energy  minimization  (in  A)a 


d:NHAla- 

NHGly=3.0 

NHLeu  >4 

NHPhe>4 

ArTyr>4 

aLeu>4 

aTyr>3.6 

,0cAla=2.6 

aGly>4 

pTyr=3.5 

pPhe>4 

pLeu>4 

MetAla=3.1 

d:NHGly- 

NHPhe=3.3 

ArPhe>4 

ArTyr>4 

aLeu>4 

aTyr>4 

aAla=3.1 

aGly=2.7 

pTyr>4 

pPhe>4 

pLeu>4 

MetLeu>4 

MetAla=3.1 

d:NHLeu- 

NHPhe=2.6 

ArPhe=3.9 

ArTyr>4 

aLeu=2.8 

aTyr>4 

aAla>4 

aGly>4 

pTyr>4 

pPhe>4 

MetAla>4 

d:NHPhe- 

ArPhe=3.7 

ArTyr>4 

aLeu>4 

aTyr>4 

aAIa>4 

aGly=2.6 

pTyr>4 

pphe=3.0 

pyLeu=3.5 

MetAla>4 

d:ArPhe- 

ArTyr>4 

pTyr>4 

aLeu>4 

aTyr>4 

aAla>4 

aGly>4 

d:ArTyr- 

aLeu>4 

aTyr>4 

aAla>4 

aGly>4 

pPhe>4 

diaLeu- 

aTyr>4 

MetAla>4 

aAla>4 

aGly>4 

pTyr>4 

pPhe>4 

d:aTyr- 

aAla>4 

aGly>4 

pPhe>4 

d:aAla- 

aGly>4 

pPhe>4 

d:aGly- 

pTyr>4 

pPhe=3.7 

d'.pTyr- 

MetAla>4 

d:PyLeu- 

3.7<MetAla<4 

a  When  several  protons  were  involved,  as  for  a  Gly  for  instance,  the  closest 
proton  was  taken 

From  the  NOE  data,  a  model  was  built  "by  hand",  using  molecular  models.  It 
was  used  as  a  starting  conformation  for  the  energy  minimization  process.  Random 
conformations,  and  structures  derived  from  X-Rays,  were  also  tested.  They 
generally  gave  either  problems  during  the  energy  minimization  (rising  energies) 
and  (or)  very  distorted  structures  having  unrealistic  bond  angles  and  bond 
distances  and  very  high  energies.  In  this,  approach,  the  validity  of  the  model  was 
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demonstrated  by  the  fact  that;  the  energy  of  the  constrained  molecule  was  close  to 
the  energy  of  the  structure  minimized  in  the  absence  of  constraint.  After  energy 
minimization  the  constrained  distances  were  equal  to  those  indicated  in  table  1 
within  0.1  A. 

From  the  refined  stucture  a  coordinate  file  was  created  (.mol  file,  wich  is 
available  upon  request)  and  used  by  the  Moldraw  software  (J.M.  Cense)  on  a 
Macintosh  computer  to  generate  the  ball  and  stick  model  shown  in  figure  1. 


Phe 


Fig.  1.  Membrane  bound  conformation  of  (D-Ala2)-Leu5  enkephalin  as  determined 
by  the  combined  use  of  transferred  NOE  and  restrained  energy  minimization. 

The  refined  model  obtained  after  energy  minimization  was  found  to  be  close  to 
what  was  guessed  on  the  basis  of  molecular  models  and  the  main  conclusions  were 
confirmed.  In  particular  the  type  II'  P-tum  between  D-Ala,  Gly,  Phe,  Leu,  the  y- 
tum  around  D-Ala  and  the  position  of  the  tyrosine  apart  from  the  rest  of  the 
molecule  were  preserved.  The  membrane-bound  conformation  of  enkephalins 
approximates  very  closely  that  of  highly  active  cyclic  analogs  (ref.  15),  and 
appears  to  be  correlated  with  activity  (ref.  13). 

The  use  of  molecular  mechanics  has  confirmed  that  our  model  is  consistent 
with  ah  the  NOE  data  and  it  is  therefore  reasonable  to  believe  that  there  is  one 
major  membrane  bound  conformation  of  enkephalins. 
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DISCUSSION 


FAUPEL  -  Conformation  membranaire  de  Leu-Enkephaline  :  how  is  it  incorporated  in 
the  membrane,,  i.e.  you  have  found  the  conformation,  but,  how  is  it  oriented  in  the 
membrane  ? 

M1LON  -  As  we  are  using  perdeuterated  phospholipids,  we  get  no  information  at  the 
moment  about  the  position  in  the  membrane..  Only  the  membrane  bound  conformation 
of  the  peptide  is  determined.  We  are  currently  developing  new  methodologies  to 
answer  this  very  important  question. 


685 


Modelling  of  Molecular  Structures  and  Properties.  Proceedings  of  an  International  Meeting, 
Nancy,  France,  11-15  September  1989,  J.-L.  Rivail  (Ed.) 

Studies  in  Physical  and  Theoretical  Chemistry,  Volume  71,  pages  685-693 
©  1990  Elsevier  Science  Publishers  B.V.,  Amsterdam  —  Printed  in  The  Netherlands 


PRELIMINARY  STRUCTURE  DETERMINATION  OF  TWO  TOXINS  USING  NMR  DATA 

A.  MIKOUl,  S.  LAPLANTEl,  R.  LE  GOAS2,  M.-A.  DELSUC2, 1.  CHARPENTIER3, 
E.GUITTET1*  AND  J.-Y.  LALLEMAND2. 

1  Laboratoire  de  RMN,  ICSN,  CNRS,  91 1 90  Gif  sur  Yvette  (France). 

2Laboratoire  de  RMN,  DCSO,  Ecole  Polytechnique,  91128  Palaiseau  Cedex  (France). 
3Service  de  Biochimie,  CEN,  91191  Saclay,  Gif  sur  Yvette  (France). 

SUMMARY 

A  rapid  means  of  determining  a  preliminary  three-dimensional  structure  of  a  cobra 
toxin  (71  aa  )  using  NMR  data  is  described.  First,  secondary  structures  were  easily 
assigned  using  characteristic  MOD  patterns.  Distance  constraints  from  these  secondary 
structures  were  then  applied  in  restrained  energy  minimization  and  molecular  dynamic 
calculations  to  provide  a  preliminary  three-dimensional  reconstruction  of  the  cobra  toxin. 
A  similar  study  is  currently  underway  for  a  scorpion  toxin  (64  aa ). 

Abbreviations  :  NMR.  Nuclear  Magnetic  Resonance;  MCD,  Main  Chain  Directed; 
HOHAHA,  total  correlation  spectroscopy;  COSY,  Correlated  Spectroscopy;  NOESY, 
Nuciear  Overhauser  Effect  Spectroscopy. 

INTRODUCTION 

NMR  has  become  an  important  tool  in  determining  three-dimensional  structures  of 
small  proteins  in  solution.  The  key  factor  involved  is  the  nuclear  Overhauser  effect  which 
reveals  interproton  short  and  long  range  distances  of  less  than  5A.  To  determine 
possible  conformations  which  satisfy  all  these  distance  constraints,  restrained  energy 
minimization  and  molecular  dynamic  calculations  need  to  be  used  (refs.  1-2).  However, 
many  difficulties  quickly  arise.  One  of  the  most  serious  is  to  assign  the  NMR  crosspeaks 
in  the  2D  NOE  maps  to  the  parent  protons. 

To  solve  this  assignment  problem,  several  methods  have  been  proposed.  The 
sequential  assignment  method  (ref.  3)  consists  first  of  identifying  all  or  most  amino  acid 
residues  using  HOHAHA  and  COSY  experiments  which  reveal  the  connectivities 
through  bonds.  Using  aH-NH  NOE  information,  the  neighboring  residues  can  then  be 
identified.  The  resulting  short  segments  of  neighboring  residues  are  then  matched 
against  the  corresponding  segments  in  the  primary  sequence.  Secondary  structures  are 
thus  identified  only  after  the  amino  acid  assignment  has  been  completed. 

The  recently  proposed  main-chain  directed  (MCD)  assignment  strategy  (refs.  4-5), 
on  the  other  hand,  requires  defining  all  the  main-chain  spin  systems  followed  by 
recognition  of  NOE  connectivity  patterns  which  are  characteristic  of  secondary  structures 
(helices,  sheets,  turns  and  extended  chain).  The  recognition  of  a  few  amino  acid  types 
places  the  discovered  secondary  structure  within  the  polypeptide  sequence.  Unlike  the 
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sequential  assignment  approach,  the  MCD  assignment  strategy  does  not  rely  on  the 
difficult  task  of  recognizing  many  side-chain  systems  among  the  crowded  side-chain 
regions  of  the  NMR  maps. 

In  this  paper,  we  have  used  MCD  patterns  to  characterize  secondary  structures  of 
both  the  cobra  and  the  scorpion  toxins.  Finally,  in  the  case  of  the  cobra  toxin,  distance 
constraints  from  the  discovered  secondary  structures  were  applied  in  restrained 
calculations  to  determine  a  preliminary  three-dimensional  structure. 

METHODS 
Samples  preparation 

The  solutions  of  the  cobra  toxin  (a-cobratoxin  of  the  long  chain  toxins  of  Naja  Naja 
Siamensis)  and  the  scorpion  toxin  (AaH  III  toxin  from  Androctonus  Australis  Hector)  were 
unbuffered  arid  contained  in  5-mm  NMR  sample  tubes.  The  protein  concentrations  were 
3  mM  and  2.5  mM  for  the  cobra  and  the  scorpion  toxins,  respectively. 

Nuclear  Magnetic  Resonance 

All  the  1 H  NMR  experiments  were  performed  at  400  and  600  MHz  on  Bruker  AM 
spectrometers.  The  data  obtained  were  transferred  to  a  Vax  GPX  II  station  and 
subsequently  processed  with  the  GIFA  software  (ref.  6). 

Energy  ..Calculations 

All  the  calculations  were  performed  on  an  IBM  3090  VF  system  using  the  AMBER 
software  (refs.  7-8). 

DISCUSSION 

The  MCD  patterns  prove  to  be  particulary  usefull  to  characterize  secondary 
structures  such  as  p  sheets  and  helices. 

The  P  sheet  of  the  cobra  toxin  was  characterized  by  first  observing  the  readily 
apparent  <xH-aH  crosspeak  shown  as  peak  number  1  in  the  NOESY  spectrum  in  Fig.1. 
The  aH-aH  peaks  are  usually  more  visible  in  spectra  taken  in  D2O.  Peak  number  1  was 
then  used  as  the  origin  for  locating  the  inner-loop  MCD  pattern  shown  as  thick  lines  (and 
numbered).  The  numbers  coincide  with  the  numbered  peaks  in  the  inset.  Aligned  along 
both  aH  chemical  shifts  of  peak  1,  four  crosspeaks  can  be  found  which  form  a 
rectangular  shape  (see  peaks  2,  3,  4,  and  5).  This  rectangle  has  alternating  strong 
(peaks  2  and  4)  and  weak  (peaks  3  and  5)  crosspeak  intensities.  For  this  analysis,  only 
the  NH-aH  part  of  the  spin  system  needed  to  be  defined  in  order  to  distinguish  inter¬ 
residue  from  intra-residue  crosspeaks.  Connected  inner-loops  and  loop  subsets  (see 
(ref.  4)  for  further  discussion  on  loop  subsets)  were  subsequently  found  thus  extensively 
defining  a  triple-stranded  antiparallel  sheet.  Alignment  of  this  complete  pattern  with  the 
primary  sequence  for  assignment  purposes  was  straightforward  since  only  a  few  easily 
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identifiable  side-chain  spin  systems  needed  to  be  used  (i.e.  Val,  Thr,  Gly  etc). 

MCD  patterns  can  also-be  used  to  identify  helix  secondary  structure.  The  basic 
MCD  patterns  of  the  a  helix  consists  of  a  series  of  closed  loops,  a  single  loop  is  formed 
by  a  strong  NH-NH  NOE  connectivity  and  also  intra-residue  and  inter-residue  NH-pH 
connectivities  (ref.  4).  Connecting  five  of  these  basic  loops,  we  then  characterized  a 
mini-helix  in  the  cobra  toxin.The  helix  was  further  confirmed  by  the  presence  of  aH-NH 
(i,i+3)  and  aH-pH(i,i+3)  NOE  connectivities  (ref.3).  The  amino  acids  within  the  helix  were 
assigned  by  aligning  the  complete  pattern  with  the  primary  sequence. 
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Fig.  1.  600  MHz  NOESY  subspectrum  (300  msec,  mixing  time)  of  cobra  toxin  in  DgO.  The 
number  beside  the  crosspeaks  correspond  to  the  numbered  striped  lines  of  the  full-loop 
MCD  pattern  in  the  inset. 


The  MCD  based  assignments  represent  21  residues  involved  in  the  triple- 
stranded  antiparallel  p  sheet  ( strand  1  :  from  residue  52  to  58;  strand  2  : 19  to  25;  strand 
3  :  36  to  42  )  and  7  residues  involved  in  the  mini-helix  ( from  residue  29  to  34  ),  that  is 
Altogether  28  residues  of  the  71  amino  acids  representing  more  than  one  third  of  the 
cobra  toxin  sequence. 

The  same  triple  stranded  antiparallel  p  sheet  is  readily  observed  in  the  XRAY 
structure,  however  the  mini-helix  structure  which  is  characterized  by  the  MCD  patterns  is 
not  particularly  mentioned  in  the  crystal  structure  (ref.9). 

The  scorpion  toxin  was  subjected  to  a  similar  analysis  as  described;  a  single 
triple-stranded  antiparallel  p  sheet  and  an  a  helix  of  two  and  a  half  turns  were  found. 
Although  the  crystal  structure  has  not  been  determined,  toxins  of  the  same  family  also 
exhibit  a  p  sheet  and  an  a  helix  in  similar  regions  of  the  primary  sequence,  as  revealed 
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by  NMR  (ref.  10)  and  XRAY  studies  (refs.  11-13). 

In  the  case  of  the  cobra  toxin,  a  set  of  70  NMR  distance  constraints  collected  from 
the  secondary  structures,  was  used  as  input  in  energy  minimization  and  molecular 
dynamic  calculations,  using  the  AMBER  program  (refs.  7-8).  This  study  allowed  us  to 
determine  a  preliminary  three  dimensional  structure  of  the  cobra  toxin. 

The  XRAY  structure  was  used  as  a  starting  point  for  all  these  calculations.  In  the 
first  calculation,  the  XRAY  structure  was  energy  minimized  using  a  harmonical  potential. 
V(d)=0.5  kd  (d-dmes)2 

where  dmes  is  an  estimated  distance  based  on  the  observed  NOE  between  the  parent 
protons  and  d  is  the  target  distance  (refs.  14-15).  For  a  strong,  a  medium,  a  small  NOE, 
dmes  was  taken  as  2.4A,  2.9A  and  3.75A,  respectively,  kd  is  taken  as  25.0,  17.5  and 
11.0  kcal/moleA  for  the  strong,  medium  and  small  NOEs.  For  distances  involving 
methylene  protons  (for  which  stereospecific  assignments  remain  presently  to  be  made), 
kd  was  taken  as  1 .6  kcal/moleA. 

For  the  restrained  molecular  dynamics,  the  following  three  steps  were  performed 
-the  distance  restraints  were  first  introduced  and  the  starting  structure  again  energy 
minimized. 

-a  first  run  of  molecular  dynamics  (1.25ps  at  300K,  consisting  of  0.5fs  steps) 
equilibrated  the  system  thermally. 

-a  second  run  of  molecular  dynamics  (5ps  at  300K,  consisting  of  0.5fs  steps).  The 
average  structure  is  then  computed  and  minimized  and  analyzed. 


( 

! 
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Fig.  2.  Superposition  of  the  restrained  energy  minimized  mean  structure  with  the 
initial  X-ray  structure  (backbone)  of  a  cobratoxin. 

Among  the  salient  features  of  the  reconstructed  structures,  as  compared  with  the 
starting  X-ray  structure,  one  should  note  first  the  striking  similarity  of  the  two  structures 
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with  respect  to  the  overall  shape,  size,  polypeptide  fold  and  secondary  structure  (Figure 
3).  To  go  into  some  details,  one  could  mention  in  addition  : 

-  a  good  overall  agreement  between  the  X-ray  and  the  calculated  structure  at  the 
triple-stranded  antiparallel  p  sheet,  allthough  the  strand  extending  from  residue  36  to 
residue  40  is  more  affected  (Figure  4). 


Fig.  4.  Atomic  rms  difference  between  the  restrained  energy  minimized  mean 
structure  with  the  initial  X-ray  structure  of  a  cobratoxin.  Diamonds  represent  the  average 
RMS  over  all  the  atoms,  whereas  squares  represent  the  corresponding  RMS  over  the 
backbone. 


-  the  presence  of  a  rather  canonical  mini-helix  between  residue  29  and  35  which 
is  hardly  recognized  in  the  crystal  structure  (Figure  4). 

-  the  marked  differences  between  the  two  structures,  significantly  enough,  occur  in 
regions  where  intermolecular  interactions  are  expected  in  the  crystal  (Phe-29  of  one 
molecule  is  reported  to  "protrude  and  tuck  into  the  arch  formed  by  the  tail  residues  (63- 
68)  and  the  underside  of  the  first  loop  (residues  3-10)  of  the  neighboring  molecule")  (ref. 
9). 

CONCLUSION 

The  present  results  allow  only  preliminary  and  careful  conclusions  to  be  drawn 
due  to  the  incomplete  analysis  of  the  NMR  data.  They  clearly  demonstrate  however  that 
a  rough  analysis  of  the  NMR  data  set,  merely  based  on  the  MCD  approach,  allows  to 
give  a  first  insight  into  the  three-dimensional  structure  of  small  sized  proteins  and  a  first 
refinement  of  the  available  X-ray  structures  when  the  resolution  is  relatively  poor. 
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DISCUSSION 


SURCOUF  -  How  many  NOE  data  did  you  use  in  your  dynamics  calculations  ? 

MIKOU  -  In  the  case  of  the  a  Cobra-toxin,  the  distance  constraints  were  collected  from 
the  antiparallel  (3  sheet  and  the  small  helix.  About  70  interproton  distances  were  used 
in  the  restrained  calculations. 


MORIZE  -  What  is  the  resolution  of  the  X-ray  structure  ?  and  why  are  the  RMS  values 
so  big  ?  (about  6  A  for  an  helix  between  the  model  structure  and  the  X-ray  structure)  ? 

MIKOU  -  Unfortunately  the  X-ray  structure  has  a  low  a  resolution  about  2.8  A. 

Given  this  low  resolution,  we  were  interested  in  refining  the  structure  using  NMR  data. 
This  is  why,  we  used  the  X-ray  structure  as  a  starting  point  in  restrained  molecular 
dynamic  calculations. 

Now  concerning  the  graph  which  represents  the  RMS  between  the  X-ray  structure  and 
the  model  structure  (X-ray  structure  +  restrained  molecular  dynamic  calculations  using 
the  NMR  distance  constraints),  we  can  see  that  the  anti-parallel  (3  shet  of  the  a 
cobra-toxin  remains  relatively  stable  during  the  calculations. 

However,  the  higher  RMS  values  indicate  that  the  large  differencies  occur  for  the 
region  between  residue  29  to  35. 

This  region  corresponds  to  mini-helix  which  was  characterized  by  MCD  patterns  but  it 
was  not  mentioned  in  the  publication  of  the  X-ray  structure  (M.D.  Walkinshaw  et  al, 
Proc.  Natl.  Acad.  Sci.  USA  77  (1980)  2400-2404). 

Finally  NMR  distance  constraints  collected  from  this  region,  were  applied  in  the 
restrained  molecular  dynamics,  we  then  obtained  a  small  helix,  whereas  in  the  crystal 
structure  it  is  not  as  well  formed. 


PEPE  -  Several  structures  of  snake  venom  toxins  were  solved  by  X-ray  analysis,  they 
are  very  similar  and  they  only  differ  by  the  length  of  the  large  loop.  These  structures  do 
not  present  any  a-loop  as  indicated  by  NMR  analysis  performed  on  one  of  these 
toxins,  let  me  notice  that  a-loop  are  very  well  solved  by  X-ray  analysis. 
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MIKOU  -  As  I  mentioned,  the  distance  constraints  from  the  helix  region  were  applied  in 
the  restrained  calculations.  We  found  that  the  resulting  local  structure  which  satisfy  all 
these  conformational  restraints  was  a  mini-helix.  This  helix  is  not  mentioned  in  X-Ray 
structure  probably  because  of  the  low  resolution  (2.8  A).  Subsequently,  in  the  X-Ray 
structure  intermolecular  interactions  were  expected  in  this  region,  this  probably 
destabilize  the  small  helix. 


WODAK  (comment)  -  A  detailed  analysis  of  contacts  between  molecules  in  the  cobra 
toxin  crystal  structure  could  be  useful  in  explaining  the  difference  between  the  solution 
and  crystal  structures  of  this  protein.  It  may  well  be  possible  that  the  region  shown  to 
adopt  a  helical  conformation  in  solution  may  be  prevented  from  doing  so  in  the  crystal 
due  to  interactions  with  neighbouring  molecules,  a  fact  which  one  tends  to  forget. 
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SUMMARY 

The  structures  of  the  complexes  formed  between  both  ditercalinium  (7H-py- 
rido[4,3c]carbazole  rigid  dimer )  and  a  ditercalinium  analogue  with  a  flexible  chain, 
with  the  self-complementary  octanucleotide  d(TpTpCpGpCpGpApA)2  have  been  in¬ 
vestigated  by  400  MHz  !h  and  160  MHz  31 P  NMR.  The  assignation  of  nearly  all  pro¬ 
tons  and  all  phosphorus  atoms  was  achieved  by  2D  NMR,  and  NOE  intermolecular 
effects  show  unambiguously  bis-intercalation  of  both  drugs  into  the  octanucleotide. 

INTRODUCTION 

7H-Pyrido  [4,3]c  carbazole  dimers  of  the  ditercalinium  family  are  DNA  bis-interca- 
lators  that  display  high  DNA  affinity  and  antitumor  properties.(ref  1)  Their  antitumor 
properties  depend  on  substitution  on  the  pyridocarbazole  rings  and/or  "flexibility"  of 
the  linking  chain.(ref  2)  When  the  two  piperidine  rings  of  ditercalinium  (figure  I  a)  are 
replaced  by  six  methylene  groups  (  as  shown  in  figure  I  b),  the  drug's  cytotoxicity  for 
the  LI  210  cells  disappears  and  it  no  longer  displays  antitumour  properties  in  mice. 
However,  the  K  affinity  for  DNA  is  2.1 07  M’1  for  ditercalinium  and  1.10®  M’1  for  the 
flexible  analogue. 

To  get  a  better  understanding  in  the  inverse  order  in  affinities  and  cytotoxicities,  the 
interaction  of  ditercalinium  and  its  "flexible"  analogue  with  both  the  tetranucleotide 
d(CpGpCpG)2  and  the  octanucleotide  d(TpTpCpGpCpGpApA)2  has  been  investiga- 
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ted  by  1H  and  31 P  NMR.Tirst.with  a  1:2  drug-heiix_  ratio  for  identification  of 31 P  bound 
signals  from  2D  exchange  spectroscopy  and  then  with  a  1 :1  drug  to  helix  ratio.for 
structural  studies.  The  octanucleotide  was  chosen  as  it  has  only  one  site  available  for 
bis-intercalation,  and  is  long  enough  to  probe  long  distance  perturbations.  Previous 
studies  on  ditercalinium  have  shown  that  it  bisintercalates  in  the  sequence  CpGpCpG 
with  one  excluded  site  (GpC)  (ref  3,4) 


CH^  /CH2^CH2-CH2^CH2^CH2 


MATERIALS  AND  METHODS 
MATERIALS 

The  oligonucleotides  samples  were  chromatographed  overchelex  100  resin  to  re¬ 
move  dicationic  metal  ions.  The  samples  (nucleotides  and  drugs)  were  then  lyophili- 
zed  twice  from  D2O  prior  to  use  and  redissolved  in  0.05  M  deuterioacetate  buffer 
(from  CEA  deuterioacetic  acid)  pH  5.8  to  a  1  mM  final  concentration  in  duplex. 
Concentrated  drug  solution  was  gradually  added  to  the  oligonucleotide  solutions  to  a 
1 :2  drug  to  helix  ratio  for  identification  of  1 H  and  3"l  P  bound  signals  from  2D  ex¬ 
change  spectroscopy  and  then  to  a  1 :1  drug  to  helix  ratio  for  structural  studies.  All  ex¬ 
periments  were  carried  out  at  295°K,  except  for  2D  31 P  exchange  experiments  which 
were  recorded  at  various  temperatures  from  295°K  to  315°K. 
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1HNMR 

Proton  NMR  spectra  were  obtained  using  a  400  MHz  Bruker  AM  spectrometer 
equipped  with  an  ASPECT  3000  computer.  Mixing  times  from  50  ms  to  300  ms  were 
used  in  2D  NGESY  experiments. 

31 P  NMR 

Phosphorus  spectra  were  obtained  with  a  selective  5  mm  phosphorus-proton  dual 
probe,  at  162  MHz,  under  the  following  experimental  conditions:  ID  spectra  were  re¬ 
corded  with  broad-band  proton  decoupling  using  a  composite  pulse  decoupling  sys¬ 
tem  applied  during  acquisition,  the  two  dimensional  31 P  f1  H}  chemicai  shift  correla¬ 
ted  experiments  were  performed  using  the  polarization  transfer  from  proton  to  phos¬ 
phorus  via  the  phosphorus  proton  coupling  constant  or  using  the  reverse  experiment 
of  transfer  from  phosphorus  to  proton  The  ♦  "o  dimensional  31  P-31  p  chemical  ex¬ 
change  experiments  were  improved  using  <  ;  nposite  pulse  decoupling  during  the 
acquisition  and  mixing  times  of  300  and  500  ms  were  used. 

RESULTS 

1 H  Resonances  assignments 

In  all  cases,  one  of  the  major  features  of  the  1 H  spectra  was  that  symmetry  of  the 
self  complementary  oligonucleotides  was  not  perturbed  by  drug  fixation  (  as  seen 
previously  for  ditercalinium  in  d(CGCG)2,  (ref  3)),  i.e.  only  one  signal  could  be  seen 
for  the  two  symmetrical  protons  on  the  two  nucleotide  strains,  and  the  dimer  drug 
protons  remain  symmetric.in  complexes. 

The  2D  COSY  experiments  led  to  the  assignment  of  all  spin  systems  in  free  and 
bound  nucleotides  and  the  2D  NOESY  experiments  permitted  the  nucleotides  to  be 
ordered  through  sequential  assignment..  In  the  case  of  d(CpGpCpG)2,  only  the  free 
form  of  tetranucleotide  and  the  complex  with  flexible  analogue  was  investigated,  the 
complex  with  ditercalinium  having  been  previously  studied  by  Delbarre  et  al.  (Ref  3). 
Bound  tetranucleotide  protons  were  assigned  by  exchange  peaks  in  2D  NOESY  of 
the  1:2  drug  to  helix  ratio  solution,  which  correlated  free  (  previously  assigned  )  and 
bound  protons.  With  the  octanucleotide  complex,  no  exchange  peaks  were  observed 
under  the  same  conditions  (Temperature:295°K,  mixing  time:300ms  ),  so  identifica¬ 
tion  of  spin  systems  and  sequential  assignment  were  carried  out  using  the  1:1  drug 
to  helix  ratio  solution.  For  the  tetranucleotide  complex,  a  1:1  drug  to  helix  ratio  could 
not  be  reached  because  of  precipitation  of  the  complexes. 

The  proton  chemical  shifts  in  all  complexes  (Tables  1  and  2)  show  that  the  aroma¬ 
tic  protons  of  both  nucleotide  and  drug  are  nearly  equally  shifted  for  ditercalinium 
complexes  and  analogue  complexes,  except  for  drug  proton  H5.(  and  H8  and  H9  in 
the  octanucleotide  complexes).  In  constrast,  nucleotide  ribose  protons  differed  much 
more  between  ditercalinium  complexes  and  analogue  complexes.  For  d(CGCG)2 
complexes,  the  largest  chemicai  shift  differences  observed  between  ditercalinium 
complex  and  analogue  complex,  was  those  of  the  internal  cytosine  and  guanine  ri- 


TABLE  1 


Chemical  shifts  (9)  in  octanucleotide  free  and  in  complexes 


ribose  free 
protons 

analog 

complex 

diter- 

calinium 

complex 

A9* 

aromatic 

protons 

free 

analog  diter- 
complex  calinium 
complex 

A9* 

IThyl'  6.05 

5.87 

5.87 

0.00 

1ThyCH3  1.78 

1.65 

1.64 

0.01 

1Thy2'  2.20 

2.23 

2:20 

0.03 

1Thy6 

7.61 

7.60 

7.57 

0.03 

1Thy2"  2:57 

2.45 

2.45  . 

0.00 

2Thy1’  6.17 

5.93 

5.93 

0.00 

2ThyCH3  1.73 

1.62 

1.60 

0.02 

2Thy2*  2.28 

2.22 

2.16 

0.06 

2Thy6 

7.62 

7:59 

7.57 

0.02 

2Thy2"  2.58 

2:43 

2:42 

0.01 

3Cyt1 '  5.61 

5.83 

5.62 

0.21 

3Cyt5 

5.73 

5.55 

5.51 

0.04 

3Cyt2'  2.13 

2.27 

2.29 

0.02 

3Cyt6 

7.57 

7.70 

7.65 

0.05 

3Cyt2"  2.42 

2.70 

2.92 

0.22 

4GuaT  5.88 

5.33 

5.38 

0.05 

4Gua8 

7.92 

7.64 

7.65 

0.01 

4Gua2'  2.69 

2.49 

2.31 

0.18 

4Gua2"  2.63 

2.49 

2.31 

0.18 

5Cyt1'  5.58 

5.95 

5.92 

0.02 

5Cyt5 

5.34 

5.09 

5.07 

0.02 

5Cyt2'  1.77 

2.27 

2.27 

0.00 

5Cyt6 

7.24 

7.27 

7.26 

0.01 

5Cyt2"  2.21 

2.54 

2.54 

0.00 

6Gua1'  5.33 

5.22 

5.13 

0.09 

6Gua8 

7.83 

7.64 

7.65 

0.01 

6Gua2'  2.56 

2.52 

2.50 

0.02 

6Gua2"  2.60 

2.52 

2.50 

0.02 

7 Adel'  5.98 

5.88 

5.88 

0.01 

7Ade8 

8.06 

7.96 

7.95 

0.00 

7Ade2'  2.56 

2.47 

2.46 

0.01 

7Ade2"  2.76 

2.69 

2.92 

0.24 

8Ade1'  6.19 

6.16 

6.13 

0.02 

8Ade8 

8.07 

8.01 

8.01 

0.00 

8Ade2'  2.52 

2.51 

2.50 

0.01 

8Ade2"  2.33 

2.33 

2.32 

0.02 

*A9  =|  9  (analog)  -  9  (ditercalinium)l 
TABLE  2 


Chemical  shifts  (9)  of 

drug  aromatic  protons 

in  complexes 

analog 

ditercaliniuml  A9* 

analog 

ditercaliniuml 

A9‘ 

HI 

9.50 

9.55 

0.05 

H8 

6.50 

6.59 

0.09 

H3 

8.21 

8.24 

0.03 

H9 

6.22 

6.12 

0.10 

H4 

7.67 

7.59 

0.08 

H1 1 

7.31 

7.28 

0.03 

H5 

7.51 

7.29 

0.22 

H6 

8.00 

7.92 

0.08 

CH3 

3.69 

3.63 

0.06 

*A9  = 

=|  9  (analog)-  9  (ditercalinium)] 
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bose  protons.  For  d(TTCGGGAA)2  complexes,  the  largest  differences  occur  for  the 
the  cytosine  3  and  guanine  4  ribose  protons.  Interestingly,  these  latter  protons  are  on 
the  same  side  of  intercalation  site  as  the  drug  proton  H5  which  is  also  differently  shif¬ 
ted  in  the  two  complexes  (see  above).  A  similar  geometry  at  the  intercalation  sites.for 
ditercalinium  and  analogue  complexes  is  suggested  by  the  similarity  of  the  chemical 
shifts.of  the  aromatic  protons 

In  the  two  octanucleotide  complexes,  inter-molecular  NOEs  were  observed  between 
drug  aromatic  protons  H8  (and  H9)and  HI '(and  H2',H2")  of  cytosine  5  ,  between 
drug  H4  (and  H5)  and  HI'  of  cytosine  3  (all  these  interacting  in  minor  groove),  and 
between  drug  0-CH3  and  H5  of  cytosine  5  (major  groove). 

The  inter-molecular  observed  NOEs  clearly  indicate  major  groove  bis-intercalation 
(see  figure  2 ). 


Fig  2: 

Preliminary  model 
of  the  ditercalinium 
-d(TTCGCGAA)2 
complex 

|  Intermolecular 
I  NOEs 


31 P  Resonances  assignments 

The  phosphorus  signals  were  assigned  through  2D  3-1  P  {"*  H}  heteronuclear  corre¬ 
lation  experiments  in  the  free  nucleotides,  whereas  the  bound  phosphorus  atoms 
were  assigned  by  2D  - 31 P  exchange  experiments. 

(i)  Octanucleotide  complexes.  In  the  octanucleotide  complexes  -  the  only  ones 
with  which  1 :1  drug  to  helix  ratio  could  be  obtained  -  the  signals  in  the  analogue 


complex  were  much  broader  than  in  the  ditercalinium  complex  (tig  4).  2D  -  31 P  ex¬ 
change  experiments. were  carried  out  on  all  complexes  (ratio  1:2  drug-nucleotide  ), 
but  the  analogue-octanucleotide  complex  did  not  show  exchange  between  the  31 P  in 
free  and  bound  nucleotide  even  at  315°K,  suggesting  a  longer  lifetime  in  the  nucleo¬ 
tide  (in  agreement  with  greater  Kaffinity  measured  with  DNA).  Thus,  only  the  ditercali- 
nium-octanucleotide  complex  31 P  resonances  were  unambiguously  assigned.  The 
two  phosphorus  atoms  at  the  intercalation  site  CpG  were  downfield  shifted  (0.5  ppm 
and  0.9  ppm  ),  as  was  the  "interrsite"  phosphorus  GpC  (0.5ppm).These  results  agree 
with  previous  results  obtained  on  different  nucleotides  (ref  3).  In  the  analogue  com¬ 
plex,  the  two  "site"  phosphorus  atoms  are  probably  the  two  downfield  shifted  signals 
(as  in  ditercalinum  complex),  but  the  "inter-site"  phosphorus  is  less  shifted. 


d(TTCGCGAA)2 

-analogue 


d(TTCGCGAA)2 


d(TTCGCGAA)2 

-ditercalinium 


Figure  4:  31 P  Spectra 
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(ii)  Tetranucleotide  complexes.  Similarly,  in  the  complexes  formed  with  the  two 
drugs  and  the  tetranucleotide  d(CpGpCpG)2=  the  phosphorus  signals  at  the 
intercalation  sites  were  downfield  shifted.  Here  the  analogue  complex  showed  ex¬ 
change,  allowing  assignment  of  bound  phosphorus  atoms.  The  "inter-site"  phospho¬ 
rus  in  the  analogue  complex  gave  two  bound  signals,  one  upfield  shifted  and  one 
downfield  shifted,  indicating  two  complexes  or  an  asymmetrical  complex  at  this  level. 
This  did  not  appear  in  the  ditercalinium  complex,  where  the  "inter-site"  phosphorus  is 
only  downfield  shifted. 


DISCUSSION 

These  results  indicate  that  both  drug  dimers  bis-intercalate  into  DNA  via  the  major 
groove  with  similar  geometry  at  the  intercalation  site,  but  with  slight  differences  at  the 
DNA  level,  particularly  in  the  sugar-phosphate  backbone.  The  question  remains  as  to 
wether  the  difference  in  chemical  shifts  between  ditercalinium  and  analogue  com¬ 
plexes  is  due  to  a  different  geometry  or/and  different  dynamics.  To  answer  this  ques¬ 
tion  and  to  try  to  explain  the  difference  of  activity  of  these  two  drugs  (at  the  molecular 
level),  molecular  modelling  calculations  will  be  carried  out  on  these  two  complexes, 
and  the  relaxation  matrix  analysis  of  the  models  will  be  compared  to  the  experimental 
2D-NOESY  spectra  already  obtained. 
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SUMMARY 

The  structure  of  crambin  in  solution  is  determined  from  2D  NMR  data.  Distance 
constraints  are  obtained  from  nuclear  Overhauser  effect  (NOE)  measurements  using 
an  iterative  relaxation  matrix  approach  (IRMA),  which  is  applied  to  a  protein  for  the  first 
time.  fi-factor  calculations  are  performed  on  NOE  buildups  to  describe  the  quality  of 
agreement  between  theory  and  experiment.  The  final  structure  is  within  1  A  (backbone 
r.m.s.)  from  the  crystal  structure.  Refinement  procedures  are  discussed. 

INTRODUCTION 

Crambin  is  a  small,  water  insoluble  protein  of  46  residues,  which  can  be  obtained 
from  the  seeds  of  Crambe  abyssinica.  Its  precise  function  is  not  known  but  the 
sequence  is  homologous  to  a  number  of  plant  toxins.  The  crystal  structure  was  solved 
to  1.5  A  resolution  (ref.  1);  a  0.9  A  structure  has  also  been  obtained  but  the 
co-ordinates  have  not  been  published.  Despite  its  small  size  the  molecule  contains 
two  a-helices  and  a  short  p-sheet.  Its  three  disulfide  links  are  probably  responsible  for 
the  high  thermostability. 

The  present  NMR  study  was  undertaken  for  several  reasons.  First  we  wanted  to 
establish  whether  there  are  appreciable  differences  between  crystal  and  solution 
structures.  Secondly,  crambin  with  its  variety  of  secondary  structure  elements  and 
stable  structure  provided  a  good  system  to  test  a  number  of  novel  procedures;  in 
developing  these  the  crystal  structure  could  be  used  as  as  reference  point. 

EXPERIMENTAL 

Crambin  displays  amino  acid  heterogeneity  at  positions  22  (Pro  or  Ser)  and  25 
(Leu  or  lie).  It  has  been  shown  that  the  mixture  as  obtained  in  the  isolation  procedure 
consists  of  two  species,  the  Ser/lle  and  the  Pro/Leu  forms  in  a  55:45  ratio  (ref.  2).  The 
present  analysis  is  based  upon  the  NMR  data  for  the  Pro/Leu  form.  However,  the 
crystal  structure  was  determined  assuming  that  the  Pro/lle  form  is  the  dominant 
component  (ref.  1).  We  therefore  replaced  lle-25  by  Leu  in  the  structure  as  obtained 
from  the  Brookhaven  Protein  Data  Bank,  and  applied  30  steps  of  unconstrained 
energy  minimization.  The  r.m.s.  position  change  of  the  backbone  atoms  was  only  0.06 
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A.  In  the  following  this  modified  structure  will  be  denoted  as  “the  X-ray  structure”. 

Two-dimensional  NMR  spectra  of  crambin  in  D20/acetone  and  in  H20/acetone 
were  recorded  at  500  MHz.  Proton  resonances  were  assigned  using  the  established 
sequential  assignment  procedures  (ref.  3).  The  secondary  structure  has  been  reported 
before  (ref.  4).  It  is  identical  to  that  in  the  crystal:  a-helices  are  formed  by  residues  7-18 
and  23-30,  while  the  last  residues  of  the  first  helix  adopt  a  310  conformation;  an 
antiparallel  (3-sheet  is  formed  by  residues  1-4  and  32-35. 

Structure  calculations  were  based  upon  2D  NOE  spectra,  in  which  775  cross  peaks 
could  be  assigned.  The  majority  of  these  (646)  had  a  buildup  of  sufficient  quality  to  be 
used  in  a  quantitative  analysis.  Spectra  were  recorded  at  six  different  mixing  times 
ranging  from  20  to  250  ms.  All  peaks  were  integrated  by  summing  the  intensity  within  a 
rectangular  area  around  the  peak. 

ITERATIVE  RELAXATION  MATRIX  APPROACH 

Distance  bounds  were  calculated  from  the  observed  NOE  buildup  curves  by  the 
Iterative  Relaxation  Matrix  Approach,  or  IRMA  (ref.  5).  In  this  procedure  spin  diffusion  is 
accounted  for  by  solving  the  Bloch  equation  describing  multispin  relaxation 

A(fm)  =  A(0)exp(-RU  (1). 

where  A,y(fJ  is  the  NOE  intensity  belonging  to  the  spin  pair  (ij)  at  mixing  time  tm,  and  R 
is  the  relaxation  matrix.  Starting  from  a  model  structure  theoretical  NOE  values  are 
calculated  assuming  that  the  molecule  is  rigid  and  rotates  isotropically  with  a 
correlation  time  tc.  Theoretical  values  are  replaced  by  experimental  ones  when 
available,  and  the  combined  NOE  matrix  is  transformed  back  to  a  corrected  relaxation 
matrix  from  which  new  distances  are  calculated  assuming  the  same  motional 
behaviour  as  before.  Distance  bounds  are  directly  related  to  the  variation  of  the  back 
transformed  relaxation  matrix  elements  with  the  mixing  time.  Structure  calculations  are 
then  performed  using  the  new  distance  bounds.  The  whole  process  may  be  repeated 
starting  from  the  optimized  structure  until  convergence  is  obtained. 

Relaxation  matrix  elements  are  calculated  in  the  rigid  rotor  model  directly  from  the 
interproton  distances  and  the  rotation  correlation  time  (ref.  6).  An  extra  decay  term  is 
added  to  the  dipolar  contribution  on  the  diagonal  to  describe  external  leakage.  We  set 
this  extra  term  to  1.25  s-1.  The  correlation  time  tc  was  set  somewhat  arbitrarily  at  1  ns. 
Since  R;y  is  to  first  order  linear  in  tc,  the  scaling  of  experimental  to  theoretical  NOE 
values  makes  the  whole  procedure  rather  insensitive  to  the  exact  choice  of  tc  as  long 
as  cof0>  1. 

Modifications  are  made  to  the  relaxation  matrix  in  order  to  describe  the  effect  of 
methyl  group  rotations  and  of  aromatic  ring  flips.  One  possibility  is  to  add  a  kinetic 
constant  to  the  cross-relaxation  terms  involving  two  mutually  exchanging  protons.  This 
turns  out  to  be  equivalent  (ref.  7)  to  a  simple  averaging  of  all  corresponding  matrix 
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elements  of  the  equivalent  protons,  a  procedure  proposed  by  Landy  and  Rao  for 
multiple-spin  systems  undergoing  chemical  exchange  (ref.  8).  The  averaging  method 
has  been  used  here. 

STRUCTURE  CALCULATIONS 
Computational  procedures 

In  the  first  cycle  calculations  were  performed  with  the  Distance  Geometry  algorithm, 
derived  from  the  original  EMBED  program  (ref.  9).  After  embedding  all  structures  were 
subjected  to  200  steps  of  constraint  function  minimization.  This  was  followed  by  two 
times  300  steps  of  Distance  bound  Driven  Dynamics  (DDD),  i.e.  a  Newtonian 
dynamics  simulation  using  chiral  and  distance  constraint  functions  without  force  field 
terms  (ref.  10).  The  first  run  was  performed  at  300  K;  the  second-run  at  1  K,  which  has 
the  effect  of  minimizing  the  constraint  errors.  All  information  about  the  covalent 
structure,  i.e.  standard  bond  lengths  and  angles,  is  represented  by  upper  and  lower 
bounds  between  atoms  separated  by  less  than  four  bonds. 

Further  optimizations  were  performed  with  Energy  Minimization  (EM)  and 
Molecular  Dynamics  (MD)  techniques  using  the  GROMOS  force  field  and  programs 
(ref.  11).  Since  no  solvent  was  included  all  net  charges  of  side  chains  were  reduced  to 
zero.  A  cut-off  of  8  A  was  applied,  and  the  pair  list  was  updated  every  10  steps.  MD 
calculations  were  performed  with  the  leapfrog  algorithm,  a  2  fs  time  step  and  with 
coupling  to  a  heat  bath  using  a  time  constant  of  10  fs. 

IRMA  distance  limits  (ref.  5)  were  used  directly  as  upper  and  lower  distance 
constraints  in  the  simulations.  Pseudo  atoms  were  introduced  on  all  prochiral  centres 
and  the  bounds  were  relaxed  accordingly  (refs.  3,12).  For  methyl  groups  pseudo 
atoms  were  used  without  correcting  the  bound:  the  process  of  kinetic  averaging  leads 
to  a  distance  value  that  is  more  representative  of  the  distance  to  the  geometric  centre 
of  the  three  equivalent  protons,  i.e.  the  pseudo  atom,  than  of  the  distance  to  any  of  the 
individual  protons. 

IRMA  cycles 

Two  cycles  of  IRMA  have  been  performed,  starting  from  a  fully  extended  chain.  In 
the  first  cycle  8  DG-DDD  structures  were  calculated,  of  which  7  belonged  to  one  class; 
for  these  7  the  pairwise  r.m.s.  deviations  for  the  backbone  was  1.6  A  on  average.  One 
structure  differed  more  than  6  A  from  the  others,  but  this  one  also  had  the  largest 
violations.  Before  applying  the  Distance  bound  Driven  Dynamics  algorithm  the 
average  backbone  r.m.s.  value  for  the  7  structures  amounted  to  1.2  A,  which 
demonstrates  that  the  DDD  procedure  improves  the  sampling  of  allowed 
configurational  space.  The  best  DG-DDD  structure  was  simulated  for  20  ps  with 
restrained  MD  followed  by  EM.  The  result  of  this  was  used  to  recalculate  the  distance 
bounds  with  IRMA,  starting  the  second  cycle. 
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Using  the  new  bounds  structures  were  calculated  in  two  different  ways.  The  first 
procedure  consisted  of  extensive  MD  refinement  following  an  annealing  strategy.  The 
second  procedure  was  a  combination  of  DG-DDD  with  short  MD  optimization.  The 
annealing  simulation  started  from  the  best  structure  obtained  in  the  previous  cycle. 
First  25  ps  of  restrained  MD  were  performed  at  600  K.  The  elevated  temperature 
increases  the  rate  of  transitions  between  allowed  conformations.  This  is  illustrated  by 
the  occurrence  of  a  trans  to  cis  peptide  transition  for  Pro-19  after  approximately  15  ps. 
Five  structures  from  the  trajectory,  one  after  each  5  ps,  were  then  simulated  for  35  ps  at 
300  K.  Five  average  structures  were  calculated  from  the  last  20  ps  of  the  latter 
trajectories  and  subjected  to  EM. 

In  the  DG-DDD-MD  procedure  10  structures  were  calculated  with  the  DG-DDD 
technique  (as  described  above)  using  the  new  bounds.  To  each  of  these  10  ps  of 
restrained  MD  were  applied  after  energy  minimization.  The  final  structures  were  again 
energy  minimized. 


RESULTS  AND  DISCUSSION 
Annealing  run 

The  potential  energies  and  distance  restraint  energies  of  the  five  structures  are 
remarkably  similar,  see  Table  1,  despite  the  conformational  change  of  Pro-19. 


TABLE  1 

Total  potential  energies,  distance  restraint  energies  and  total  bound  violations  after 
MD  and  EM  optimization  (second  IRMA  cycle,  annealing  runs)3. 


Structure 

Potential  energy^ 
kJ-moH 

Distance  restraint  energy^ 
kJ-moH 

Total  violation 

A 

A1 

-2104 

199 

22.8 

A2 

-2079 

200 

23.2 

A3 

-2072 

204 

23.5 

A4 

-2120 

207 

24.0 

A5 

-2150 

205 

23.0 

3  Data  for  minimized  average  structures  from  last  20  psec.  of  5  different  runs  (see  text). 
b  Excluding  distance  restraint  energy 
c  Force  constant  40  kJ-moM -A-2 


The  r.m.s.  position  differences  averaged  over  the  backbone  atoms  are  0.6  A  or  less. 
The  structures  have  converged  with  respect  to  the  configurations  obtained  at  600  K: 
the  backbone  r.m.s.  deviations  between  the  latter  five  structures  are  around  1.0  A. 
Overall  the  structure  has  converged  to  the  X-ray  structure:  backbone  r.m.s.  differences 
with  respect  to  the  crystal  structure  have  changed  from  1.5  via  0.9  to  0.8  A  after 
completion  of,  respectively,  the  DG-DDD  calculation,  the  first  cycle  and  the  second 
cycle.  The  largest  differences  between  calculated  and  crystal  structures  occur  in  the 
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loop  region  between  the  helices  and  in  the  ten  terminal  residues. 

When  the  r.m.s.  differences  are  calculated  over  all  atoms  the  five  MD  structures  are 
within  1.3  A  (I.O  for  the  structures  in  which  Pro-19  is  trans),  indicating  that  also  most 
side  chains  are  rather  well  constrained.  All-atom  r.m.s.  positional  differences  with  the 
X-ray  structure  are  between  1 .5  and  1 .7  A. 

DG-DDD-MD 

Three  of  the  ten  structures  have  distance  restraint  energies  and  total  violations  of 
similar  magnitude  as  obtained  in  the  annealing  run.  The  other  structures  satisfy  the 
constraints  less  well.  However,  the  total  potential  energies  (excluding  the  constraint 
energy)  are  all  higher  than  for  the  annealed  structures.  The  difference  is  at  least  150 
kJ-moH,  which  far  exceeds  the  thermal  fluctuation. 


structures 


structures 


Fig.  la  (left)  and  1b  (right).  Correlations  between  total  potential  energy  (excluding 
distance  restraint  energy),  distance  restraint  energy  and  r.m.s.  positional  difference. 
Data  are  shown  for  1  structure  obtained  in  annealing  simulations  (denoted  as  "A1") 
and  for  10  structures  obtained  in  DG-DDD-MD  simulations  (of  which  4  are  indicated  by 
numbers),  cf.  notation  in  Tables  1  and  2. 


The  total  energy  is  plotted  against  the  distance  restraint  energy  in  Fig.  la.  Overall 
the  quantities  are  correlated  rather  well  (with  the  exception  of  one  structure),  showing 
that  the  force  field  and  the  NMR  constraints  are  not  inconsistent.  The  total  energy  is 
also  correlated  with  the  backbone  r.m.s.  positional  difference  with  respect  to  one  of  the 
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annealed  structures,  see  Fig.  1b.  The  relatively  quick  DG-DDD-MD  procedure  thus 
appears  to  probe  around  the  minimum  found  by  the  more  elaborate  annealing 
procedure,  yielding  best  structures  that  are  approximately  1  A  (backbone  r.m.s.)  from 
the  annealed  structures. 


R -FACTOR 

May  be  the  best  way  of  judging  the  quality  and  accuracy  of  the  present  results  is  to 
compare  directly  measured  NOEs  with  values  predicted  from  the  structure  on  the  basis 
of  eqn.  (1 ).  A  useful  definition  is  that  of  the  R-factor 


E5>.  VTu-Vw 


R  = 


ij 


XXv# (?m) 

i,i 


(2) 


which  is  analogous  to  the  crystallographic  definition.  Here  the  mixing  times  are  used 
as  weight  factors,  since  the  measurements  are  least  accurate  for  the  smallest  mixing 
times.  The  summations  in  eqn.  (2)  are  restricted  to  the  interresidue  NOEs  since  these 
are  structurally  the  most  important  ones.  In  cases  involving  prochiral  centres  the 
theoretical  NOEs  are  averaged  over  the  two  assignments.  Table  2  shows  that  the 
R-factor  has  dropped  considerably  in  going  from  the  initial  linear  chain  to  the  result  of 
the  second  IRMA  cycle. 


TABLE  2 

R-f  actors  for  different  crambin  structures3. 


Structure13 

all 

backbone  - 
backbone 

side  chain  - 
side  chain 

linear  chain 

0.99 

0.92 

0.94 

after  1st  cycle 

0.56 

0.39 

0.66 

A1 

0.53 

0.38 

0.69 

A2 

0.56 

0.36 

0.91 

A3 

0.51 

0.36 

0.66 

D2 

0.52 

0.36 

0.75 

D3 

0.60 

0.43 

0.81 

D6 

0.56 

0.35 

0.67 

D8 

0.62 

0,41 

1.03 

X-ray  structure 

0.47 

0.41 

0.56 

3  R-factor,  eqn.  (2),  calculated  over  interresidue  contacts  only. 
b  Structures  from  annealing  runs  indicated  by  prefix  “A”,  structures  from  DG-DDD-MD 
runs  by  prefix  "D"  (cf.  numbering  in  Table  1  and  Fig.  1). 
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The  partial  R-factor  for  backbone  -  backbone  contacts,  which  is  only  indirectly 
influenced  by  the  use  of  pseudo  atoms,  has  dropped  below  the  X-ray  structure  value, 
in  agreement  with  the  small  but  persistent  structural  deviation  obtained  in  the 
simulations.  The  side  chain  -  side  chain  partial  R-factor  exhibits  much  larger  variation, 
reflecting  the  structural  differences.  However,  the  fact  that  one  of  the  best  annealed 
structures  shows  a  particularly  high,  and  the  X-ray  the  lowest  side  chain  -  side  chain 
R-factor,  makes  it  clear  that  the  present  distance  constraints  are  not  sufficient  to 
uniquely  define  the  best  structure.  In  particular  the  introduction  of  pseudo  atoms  on 
prochiral  centres  leads  to  a  loss  of  precision.  A  second  source  of  error  is  the  neglect  of 
local  motions  in  the  theoretical  calculations. 

Overall  two  of  the  annealed  structures  (A1,A3)  and  two  of  the  DG-DDD-MD 
structures  (D2,D6)  appear  to  agree  best  with  the  experimental  NOEs.  Structure  D8 
which  also!  had  a  low  distance  restraint  energy  but  a  relatively  high  GROMOS  energy 
(cf.  Fig.  1i)j  shows  a  high  R-factordue  to  improper  positioning  of  the  side  chains. 

CONCLUSIONS 

The  present  constraint  set  for  crambin  in  combination  with  the  GROMOS  force  field 
corresponds  to  one  well'  defined  minimum,  which  as  far  as  the  protein  backbone  is 
concerned  is  close  to  the  crystal  structure.  The  residual  violations  are  already  quite 
small  after  the  first  cycle.  Further  optimization  leads  to  a  lowering  of  the  GROMOS 
energy  without  any  significant  changes  in  the  restraint  energies.  The  best  structures 
obtained  from  a  combination  of  DG  and  short  MD  runs  have  constraint  violations  and 
R-factors  that  are  similar  to  those  in  structures  obtained  in  an  extensive  annealing 
optimization,  but  the  total  potential  energy  of  the  latter  is  significantly  lower.  The  best 
structures  differ  by  approximately  1  A  on  the  backbone  from  one  another  and  from  the 
X-ray  structure.  Side  chain  orientations  are  less  well  determined.  We  are  currently 
investigating  the  effect  of  including  in  the  simulations  dihedral  angle  constraints 
obtained  from  J-couplings.  The  precision  of  the  structure  determination  can  further  be 
increased  by  obtaining  stereospecific  assignments.  A  procedure  to  derive  these 
assignments  from  a  comparison  between  measured  and  calculated  NOEs  shows 
promising  results. 
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SUMMARY 

For  determining  the  3D  structure  of  a  molecule  in  solution, 
the  most  powerful  method  is  based  on  measurements  of  proton- 
proton  NOE's.  But, their  conversion  in  term  of  distances  is 
ambiguous  due  to  internal  molecular  motions  and  spin  diffusion  . 
We  present  an  approach  for  the  structure  refinement  based  on  the 
superposition  of  the  experimental  and  calculated  NOE's  intensity 
which  seems  a  better  strategy  than  the  fitting  of  the  estimated 
and  calculated  distances  .  We  show  how  to  take  account  of 
internal  motions  of  the  molecule.  This  refinement  method  is 
applied  to  the  conformational  analysis  of  a  cyclic  lipopeptide  : 
stendomycine . 


/* 
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I  INTRODUCTION 

3D  structure  determination  of  biological  molecules  from 
I  combined  2D  NMR  and  computer  simulations  methods  are  essentially 

|  based  on  interproton  distances  derived  from  NOESY  cross  peak 

|  intensities  measurements  (refs.  1-2) . 

)The  NOE  (  Nuclear  Overhauser  Effect)  intensities  depend  on 
the  rates  of  cross  relaxation  of  protons  which  in  turn  depend  on 
fluctuations  in  the  orientation  and  length  of  inter  proton 
S  vectors  (l/rD  law)  (ref.  3)  .  In  order  to  estimate  distances 

)  which  can  be  calibrated  using  a  known  distance  (ref.  4)  all 

1  inter  proton  vectors  are  assumed  to  move  isotropically  with 

|  identical  correlation  time .  But  a  direct  relation  between  cross 

|  peak  intensity  and  cross  relaxation  rate  neglects  the 

| 

f  uncertainties  of  internal  motional  behaviour  of  the  molecule 

4 

together  with  spin  diffusion  effect  (ref.  5)  induced  by  indirect 
magnetization  transfer  via  other  protons  which  leads  to  derive 
approximate  ranges  of  proton-proton  distances. 
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Questions  about  the  reliability  of  the  distances  thus 
estimated  led  us  to  undertake  NMR  studies  in  order  to  analyse 
the  internal  dynamics  of  small  molecules.  Studies  of  NOESY  cross 
peak  intensities  as  a  function  of  the  temperature  allow  to 
discriminate  between  unvariable  and  fluctuating  interproton 
distances  and  finally  give  a  qualitative  picture  of  the 
flexibility  of  the  molecule.  Furthermore,  correlation  times  of 
fixed  distances  can  be  determined  (ref.  6)  . 

Such  an  approach  has  been  applied  to  study  the  internal 
dynamics  of  the  stendomycine  molecule,  an  antifungal 
tetradecapeptide  composed  of  a  lactone  ring  and  a  terminal 
peptidic  and  fatty  acid  linear  chain  (ref.  7) . 

14  13  12  11  10  9  8  X  2  3  4 
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The  reliability  of  the  distances  was  expressed  by  different 
sets  of  weighting  factors  for  distance  constraints  during  the 
computer  simulation  process  using  the  GROMOS  program  (ref.  8) .  A 
structural  model  of  stendomycine  is  proposed  and  its  validity  is 
discussed  on  the  basis  of  the  comparison  between  the 

experimental  and  the  theoretical  NOESY  maps  (ref.  9) . 

METHODS 

Molecular  mechanics  calculations  were  performed  using  the 
GROMOS  potential  energy  functions,  including  bond  stretching, 
bond  angle  bending,  harmonic  (out  of  plane,  out  of  tetrahedral 
configuration)  dihedral  bending,  sinusoidal  dihedral  torsions. 
Van  der  Waals  and  electrostatic  interactions.  A  constraint 
potential  term  was  added  in  order  to  let  the  molecule  satisfy 
the  set  of  NOE  distance  constraints  :  Vdc=l/2  Kdc  (r^ j-r^ j * ) 2, 
where  r^j  is  the  actual  distance  between  atoms  i  and  j  and 
the  given  distance  constraint  for  this  pair. 

Chiral  centers  were  specified  along  with  planarity 
constraints  for  carbonyl  carbons  and  amide  nitrogens.  All 
peptide  bonds  were  set  trans  and  the  ester  bond  was  kept  near 
planarity . 

No  information  was  obtained  about  the  cp  chiral  center  of  the 
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special  residue  stendomycidine,  then  the  two  possible 
configurations  R  and  S  have  been  examined.  We  focuss  essentially 
our  study  on  the  structure  of  the  lactone  ring  and  on  the 
folding  mode  of  the  beginning  of  the  peptidic  linear  chain.  The 
lack  of  information  about  the  end  of  this  chain  prevents  us  to 
model  its  conformation  correctly.  In  consequence,  the  computer 
simulations  were  undertaken  on  the  truncated  molecule  including 
the  DAla9  residue.  A  terminal  acetyl  group  simulates  the  DllelO 
residue . 

RESULTS  and  DISCUSSION 
Starting  structure 

In  this  short  paper  we  report  preliminary  results  of  the 
restrained  energy  minimizations  of  the  model  built  structure 
derived  from  graphic  manipulations  on  an  Evans  and  Sutherland 
PS390  display  system  .  The  major  difficulty  is  to  build  a 
graphic  model  which  respects  most  of  the  NMR  estimated  distances 
together  with  an  approximate  ring  closure. 

Inter  proton  distances  deduced  from  NOESY  cross  peak 
intensities  can  be  divided  in  three  groups  (ref.  7) .In  the  first 
one,  distances  are  classified  as  reliable.  In  the  second  group 
the  distances  estimated  from  the  1/r®  approximation  are 
certainly  erroneous  due  to  internal  flexibility  of  the  molecule. 
And  in  the  last  group,  the  distances  are  obtained  from  weak 
values  of  cross  peak  intensities.  In  this  case,  some  of  the 
distances  are  certainly  correct  and  others  are  not. 

Among  the  eight  non  fluctuating  distances  ,  the  HP  (But8)~ 
Hy  (But8)  evaluated  at  0.268  nm  was  used  as  reference  for 
calibration.  Such  a  set  of  distances  involves  mainly  intra 
residue  information  except  two  of  them  (NH (But8) -NH (Thrl)  and 
NH  (But8) -H  o  (Ala9) )  which  lead  to  a  particularly  well  defined 
structure  of  the  beginning  of  the  peptidic  linear  chain. 
Information  such  as  NH (But8) -NH (Val2)  proximity  and  detectable 
cross  peak  between  NH (But8) -Ha (Val3)  tend  to  prooe  the  proximity 
of  one  part  of  the  linear  chain  with  the  lactone  ring.  Model 
built  structure  reflects  this  proximity,  but  the  lack  of 
reliable  interproton  distance  is  a  strong  handicap  to  ensure  the 
unicity  of  the  starting  structure.  However,  other  interproton 
proximities  characterized  by  fluctuating  distances  unable  us  to 
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guide  the  structural  arrangement  of  the  residues . 


Refined  structure 

First,  strain  in  the  graphic  model  was  relaxed  by  performing 
steepest  descent  minimizations  ,  and  then  series  of  constrained 
energy  minimizations  using  the  conjuguate  gradient  algorithm 
followed  by  complete  relaxation  were  applied  successively. 

Two  approaches  have  been  tested  to  run  restrained  energy 
optimizations.  The  first  one  does  not  take  account  of  the 
reliability  of  the  inter  proton  distances  and  all  the 
constraints  were  equally  weighted  (force  constant=10^  KJ  mol-1 

__  O 

nm  )  (  Structures  AR  and  AS)  .  The  second  one  considers 
weighting  factors  dependant  on  the  reliability  of  the  distances 

p 

and  a  ratio  of  10  was  taken  between  force  constant  for  reliable 
and  unreliable  distances  (  Structures  BR  and  BS) . 

Comparison  between  constrained  and  total  energies  of  the 
respective  relaxed  structures  shows  the  BR  one  as  preferential 
(  Edc=144.5  KJ  mol-1,  ETot=-108.5  KJ  mol-1).  Stendomycine 
including  a  S  configuration  for  the  Ste7  residue  is  relatively 
more  strained  (  BS:  Edc=152.4  KJ  mol-1,  ETot~--54.4  KJ  mol-1). 

Superposition  of  the  relaxed  and  constrained  BR  structures 
shows  a  good  similarity.  The  relaxation  of  reliable  distances 
concerns  principaly  the  distances  involving  the  NH(Thrl)  group. 
This  corresponds  to  a  favorable  rearrangement  of  polar  groups 
improving  hydrogen  bond  interactions  between  the  first  residues 
of  the  peptidic  linear  chain  and  the  Val  residues  of  the  lactone 
ring.  Results  obtained  from  both  approaches  in  the  management  of 
the  constraints  are  illustrated  in  figure  1  showing  the 
superposition  of  the  BR  and  AR  relaxed  structures.  The  folding 
of  the  lactone  ring  looks  globally  the  same  but  the  hydrogen 
bond  network  is  completely  different.  For  example,  the  C0(Ala2) 
is  hydrogen  bonded  to  the  NH  of  the  two  Val  residues  in  the  BR 
structure  whereas  it  is  replaced  by  the  CO(But8)  in  the  AR  form. 
In  this  last  structure,  the  NH(Val2)  tends  to  share  this 
hydrogen  bond  with  the  C0(Ser5)  carbonyl  group. 

Generally,  structures  optimized  with  equally  weighted 
constraints  led  to  strained  geometries  (  high  bond  and  angle 
energies),  and  chirality  distorsions,  in  order  to  satisfy  at 
best  the  set  of  constraints.  This  result  is  illustrated  in 
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Table  1  in  which  violations  of  reliable  distances  and  strained 
energies  are  given  for  the  constrained  AR,AS,BR  and  BS 
structures. 


Fig.  1.  Stereoview  of  the  superposition  of  the  relaxed 
structures,  model  of  the  stendomycine . 


TABLE  1 

Distances  violations  of  the  reliable  interproton  distances  along 
with  strained  energies  of  the  constrained  structures  AR,  AS,  BR 
and  BS  (see  text) . 


Distance  violations  (nm)  for  structures 

Reliable  distances  AR  AS  BR  BS 

NH  (Ala9) -CqH (ALA9) 

0.034 

0.034 

0.030 

0.029 

C  H(Ala9)-NH  (But8) 

0.013 

0.028 

0.020 

0.004 

NH  (But 8) -NH  (Thrl) 

-0.017 

0.023 

0.012 

0.011 

NH  (Thrl) -CPH (Thrl) 

-0.058 

0.031 

-0.016 

0.047 

CaH (Val2) -CpH (Val2) 

0.049 

0.048 

0.047 

0.047 

NH  (Val3) -CaH(Val3) 

0.001 

0.001 

0.013 

0.013 

NH  (Ser5) -CaH \Ser5) 

0.056 

0.052 

0.017 

0.016 

Strained  energies 

(  KJ  mol  1) 

Bond 

65.2 

77.3 

24.6 

27.8 

Angle 

228.8 

249.7 

54.4 

64.1 

Another  helpful  criteria  for  discussing  the  validity  of  the 
refined  structures  is  to  compare  the  experimental  and  the 


theoretical  NOESY  maps .  Figures  2a  and  2b  show  respectively  the 
reconstructed  NOESY  experimental  map  exhibiting  only  the  cross 
peaks  with  measurable  intensities  and  the  complete  NOESY 
theoretical  map  relative  to  the  BR  relaxed  structure.  For 
clarity,  all  diagonal  peaks  are  suppressed  and  in  the 
experimental  map  many  cross  peaks  are  removed  because  they  could 
not  be  attributed  owing  to  strong  overlap. 


Fig.  2.  Representations  of  the  reconstructed  NOESY  maps, 
(diagonal  peaks  are  suppressed)  :  a)  experimental  map. 
b)  theoretical  map  relative  to  the  best  calculated  structure 
modeling  stendomycine . 
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A  detailed  comparison  of  all  cross  peaks  seems  difficult  in 
view  of  the  complexity  of  the  theoretical  map,  but  regions 
corresponding  to  the  NH  resonances  call  for  comments.  At  the 
left  top  of  the  2D  map,  The  NH-NH  cross  peaks  pattern  shows 
noticeable  differences  in  relative  intensities  values,  while  the 
NH-Ha  and  the  NH-Hp , Hy  cross  peak  distributions  seem  comparable. 
These  observations  are  the  consequence  of  internal  motions.  From 
a  simple  comparison  between  calculated  and  experimentally 
estimated  distances  it  is  not  possible  to  evidence  the  influence 
of  internal  motions  and  we  show  here  the  utility  of  fitting  two 
dimensional  NOE  spectra  as  already  mentioned  (refs.  9-10) . 

CONCLUSION 

Two  essential  points  emerge  from  this  preliminary  study  of 
the  structure  modeling  of  the  antifungal  lipopeptide 
stendomycine . 

First,  NMR  distance  information  have  been  used  taking  account 
on  the  flexibility  of  the  molecule  .  It  is  well  known  that  the 
NOE  intensities  depend  both  on  interproton  distances  and  on 
internal  dynamics.  From  NMR  studies,  we  were  able  to 
discriminate  between  constant  distances  which  can  be  used 
quantitatively  in  molecular  modeling  and  fluctuating  distances 
which  have  to  be  used  only  qualitatively.  For  these  last 
distances,  the  calculated  NOE  intensities  using  the  classical 
formula  are  obviously  different  from  the  experimental  ones  owing 

C. 

to  the  <l/r  >  averaging. 

Second  ,  the  role  of  the  weighting  factors  introduced  to 
manage  distance  constraints  during  the  optimization  process  was 
to  describe  the  reliability  of  the  interproton  distances 
estimated  from  2D  NMR.  This  strategy  is  somewhat  different  from 
the  one  usually  applied  where  weighting  factors  increase  with 
the  KOE  intensities  .  Such  an  approach  is  doubtful  since  it  has 
been  shown  that  weak  NOE  intensitites  are  not  necessarily 
related  to  large  distances,  and  inversely  strong  NOE's  can 
reflect  distances  larger  than  those  estimated  in  the  usual  way 
(ref.  11)  . 
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SUMMARY 

•'  Pristinamycin  (RP7293),  a  natural  antibiotic,  comprises  two  groups  (I  and  II)  of  synergic 

?  compounds.  Pristinamycin  IIA  is  the  major  component  of  PII.  The  conformation  of  this  molecule 

l  in  apolar  deuterochloroform  solution  has  been  studied  using  quantitative  nuclear  Overhauser 

j  effects  (NOEs)  obtained  by  ID  and  2D  NMR  spectroscopy  at  400  MHz  and  250  MHz. 

\  Restrained  molecular  dynamics  calculations  have  been  performed  using  the 

INSIGHT/DISCOVER  software  package.  25  proton/proton  distance  constraints  were  taken  into 
[  account  in  the  simulations.  The  results  were  analysed  using  various  programs  developped  in  our 

j  group  and  interfaced  with  the  program  MANOSK.  The  major  conformation  obtained  by 

i  restrained  molecular  dynamics  calculation  is  found  to  be  very  close  to  that  of  the  crystal 

|  structure  and  it  has  the  same  intramolecular  hydrogen  bond. 

j 

|  INTRODUCTION 

,  Pristinamycin  (RP7293)  is  a  natural  antibiotic,  isolated  from  cultures  of  a  soil  organism 

j  Streptomyccs  pristinaespiralis.  Pristinamycin  is  an  association  of  two  groups  of  synergic 

components  :  30  to  40%  pristinamycin  I  (PI)  and  60  to  70%  pristinamycin  II  (PII). 

The  chemistry  of  pristinamycins  has  been  recently  reviewed  (ref.  1).  The  mode  of  action  and 
,  the  pharmacology  of  these  molecules  have  been  extensively  studied.  Pristinamycins  are  mainly 

:  used  in  the  treatment  of  staphylococcal  infections. 

j  PI  and  PII  are  chemically  different.  Pristinamycin  I  is  a  mixture  of  three  peptidic 

macrolactones  of  six  aminoacid  residues.  The  crystal  structure  of  the  major  component 
(80-95%)  PIA  (RP12535)  has  been  recently  determined  in  our  group  (ref.  2)  and  we  are 
performing  NMR  experiments  (ref.  3)  in  order  to  model  the  molecular  structure  in  solution . 

1 

Natural  pristinamycin  II  is  a  mixture  of  two  polyunsaturated  macrolactones.  The  major 
component  (90-97%)  is  pristinamycin  IIA  (RP12536).  This  molecule  was  found  to  be  identical 
to  virginiamycin  Ml,  the  X-ray  structure  of  which  is  already  known  (ref.  4). 

In  this  paper,  we  will  present  a  structure  of  Pristinamycin  IIA  in  apolar  solvent.  Starting  from 
the  X-ray  structure,  the  model  was  subsequently  derived  from  NMR  data  obtained  in 
deuterochloroform  solution,  using  restrained  molecular  dynamics  simulations  and  classical 
minimisation  techniques. 

A  set  of  25  proton/proton  distance  constraints  was  taken  into  account  for  the  calculations 


mL-r f*>  " 


720 


allowing  to  determine  the  23  dihedral  angle  degrees  of  freedom. 

METHODS 
NMR  experiments 

All  experiments  were  performed  on  a  2 10‘3  M  solution  of  Pristinamycin  nA  in  deuterated 
chloroform.  Overhauser  effects  were  measured  by  ID  and  2D  NMR  on  Bruker  Spectrometers 
AM400  and  WM250.  Mixing  times  varied  between  50  and  250  msec  in  the  case  of  2D 
experiments  and  100  and  1000  msec  in  the  case  of  ID  experiments. 

Intemuclear  distances  were  calculated  by  the  ISPA  (Isolated  Spin  Approximation)  method 
(ref.  5).  The  distance  between  protons  HI  1  and  H13  was  used  as  a  reference.  Because  of  the 
conjugation  between  the  two  double  bonds,  this  distance  is  fixed  at  2.23  A. 

The  list  of  the  25  proton/proton  distances,  determined  by  the  ISPA  method  and  used  for  the 
simulations,  is  given  in  Table  1. 


TABLE  1 

Proton/Proton  distance  constraints  rHH  and  estimated  errors  Ar. 


n 

H-H 

rHH 

Ar- 

Ar+ 

1 

11-20 

3.3 

-0.2 

+0.5 

2 

11-8 

2.5 

-0.2 

+0.2 

3 

11-9 

2.6 

-0.2 

+0.3 

4 

11-9’ 

3.3 

-0.2 

+0.5 

5 

11 -(33) 

3.7 

-0.2 

+0.7 

6 

20-10 

3.5 

-0.2 

+0.6 

7 

20-13 

4.3 

-0.2 

+0.8 

8 

20-14 

5.2 

-0.2 

+1.2 

9 

20  -(24) 

3.8 

-0.2 

+0.7 

10 

20-9 

4.5 

-0.2 

+0.7 

11 

20-9’ 

3.4 

-0.2 

+0.5 

12 

20  -(33) 

4.7 

-0.2 

+0.9 

13 

17-20 

4.2 

-0.2 

+0.8 

14 

26  -(32) 

4.5 

-0.2 

+0.9 

15 

26 -(30,31) 

4.1 

-0.2 

+0.8 

16 

6-8 

3.0 

-0.2 

+0.4 

17 

6-3 

3.1 

-0.2 

+0.4 

18 

6-4 

2.6 

-0.2 

+0.3 

19 

6-32 

3.8 

-0.2 

+0.8 

20 

6 -(30,31) 

4.2 

-0.2 

+0.8 

21 

5-8 

2.4 

-0.2 

+0.2 

22 

5-3 

3.3 

-0.2 

+0.5 

23 

5-9 

3.3 

-0.2 

+0.5 

24 

5-9’ 

4.1 

-0.2 

+0.8 

25 

5 -(32) 

3.4 

-0.2 

+0.5 

n  :  NOE  constraint  number. 
The  distances  are  given  in  A. 


>  Cfrr^bi  -,<r_L  K  'i*i~f\ %  ‘  ."'****  -X  ; 


Computational  procedure 

All  the  calculations  have  been  performed  using  the  INSIGHT/DISCOVER  package  (ref.  6). 
The  analysis  of  the  results  was  performed  using  various  programs,  developped  in  our  group  (ref. 
7)  and  derived  from  the  MANOSK  software  (ref.  8). 

The  potential  energy  function  has  the  classical  analytical  form,  but  this  expression  contains 
"off  diagonal”  or  "cross  terms".  Those  terms  were  not  involved  in  our  calculations,  since  too 
many  values  about  non-peptidic  internal  coordinates  are  missing.  For  the  same  reason  we  used  a 
simple  harmonic  potential  for  the  bond  stretching  term  of  the  energy,  instead  of  a  Morse 
potential. 

In  order  to  take  into  account  the  NOE  distance  constraints,  it  is  possible  with  DISCOVER  to 
use  a  special  term  which  is  a  skewed  biharmonic  function  of  the  form : 


Ercs  =  ci  (r'ro)2  if  r  >  ro 
C2(r-r0)2  if  r  <  r0 


with : 

CpkTS/2(Ar+)2 
and : 

C;  :  force  constant 
r0  :  target  distance 
T  :  absolute  temperature 
Ar+ :  positive  estimated  error 


C2=kTS/2(Ar-)2 

k  :  Boltzmann  constant 
r  :  calculated  distance 
S  :  scale  factor 
Ar- :  negative  estimated  error 


All  the  35  hydrogen  atoms  were  treated  explicitly  in  the  calculations.  However,  when  the 
NOE  distance  constraints  refer  to  methyl  or  isopropyl  group,  using  DISCOVERS  special 
facility,  we  can  define  average  atoms,  the  coordinates  of  which  are  the  arithmetic  average  of  all 
the  hydrogen  atoms  of  each  group.  In  the  case  of  PIIA,  4  average  atoms  were  defined  for  CH3 
(32),  CH3(33),  C(CH3)2  (30,31)  and  CH2  (24). 

The  "Consistent  Valence  Force  Field"  (CVFF)  from  BIOS  YM  package  was  chosen  (ref.  6) 
for  dynamics  calculations  using  DISCOVER.  Some  parameters  were  modified  and  the  missing 
ones  were  determined  and  adjusted  by  using  X-ray  crystal  structure  minimisation  techniques  and 
by  comparing  with  other  force  fields.  Indeed,  pristinamycin  IIA,  as  shown  in  Fig.  1,  contains  an 
oxazole  nucleus  incorporated  into  the  macrolactone  structure,  which  is  relatively  uncommon. 
Therefore  the  determination  of  the  force  field  parameters  concerning  this  part  of  the  molecule 
required  special  studies. 

Starting  from  the  conformation  of  PIIA  in  the  crystal  structure  (Fig.  2),  the  structure  was 
relaxed  by  performing  an  energy  minimisation  of  100  steps  of  steepest  descent  (SD)  and  453 
iterations  of  quasi  Newton  Raphson  (VA09  minimizer  in  Discover)  until  a  final  energy  of 
76.8  Kcal  mol-1.  This  conformation  is  noted  XRMIN. 

Then,  starting  from  XRMIN,  the  conformational  space  was  searched  using  restrained 
dynamics  calculations  at  high  temperature,  in  order  to  generate  various  starting  conformations 
for  further  restrained  molecular  dynamics  simulations. 


The  following  simulation  procedure  (RMD)  has  been  performed  100  times : 

-  2  ps  (2000  steps  of  1  fs)  of  dynamics  at  900  K 

velocity  randomisation,  temperature  relaxation  time  of  Tt  =  0.01  ps 

-  2  ps  of  dynamics  at  300  K,  xt  =  0.1  ps 

-  minimisation,  storage  of  the  conformation 

All  the  NOE  distance  constraints  were  included  in  all  the  steps  of  the  simulation  with  a  scale 
factor  (S)  of  2,  leading  to  a  force  constant  of  15  Kcal  mol'1  at  300  K  for  a  Ar  equal  to  0.2  A  (see 
equation  1).  In  this  procedure,  the  conformation  at  the  end  of  a  minimisation  is  the  starting 
conformation  of  the  next  step. 

A  set  of  100  conformations  was  obtained  (noted  :  RMDi,  i=l,100)  and  several  families 
(noted :  Fj,  j=l,n  families)  of  conformations  were  derived  after  analysis. 

Finally,  for  each  family,  starting  from  the  most  representative  conformation,  the  following 
procedure  (FRMD)  was  performed : 

- 10  ps  of  dynamics  at  300  K,  velocity  randomisation,  t,  =  0.01  ps 

-  a  restrained  and  continue  dynamic  run  of  50  ps,  t,  =  0.1  ps 
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Along  the  trajectory,  every  0.5  ps  a  conformation  was  archived.  So,  a  set  of  100  snapshot 
structures  per  family  (FjRMDi)  was  obtained  for  analysis.  These  snapshot  structures  were 
subsequently  minimised  with  and  without  NOE  distance  constraints  for  a  more  realistic  energy 
comparison  with  the  previous  calculations  (RMDi)  and  with  the  crystal  structure  XRMIN. 

During  these  last  restrained  dynamics  calculations,  to  allow  more  flexibility  to  the  molecule, 
the  values  Ar+  and  At-  were  at  least  of  ±  0.5  A  and  the  NOE  distance  constraints  corresponding 
to  average  atoms  were  not  included. 

All  the  calculations  were  performed  on  a  FPS264  array  processor  connected  to  a  VAX  8550 
computer.  The  simulation  of  1  ps  of  a  PIIA  molecule  (73  atoms)  took  about  1  mn  12  s  CPU  time 
on  the  FPS. 

RESULTS  AND  DISCUSSION 

High  temperature  restrained  dynamics :  RMD 

Figure  3  shows  the  evolution  of  the  potential  and  total  (potential  +  restraint)  energies  during 
the  dynamic  run.  It  is  observed  that  the  potential  energy  varies  between  81.2  and  98.7  Kcal 
mol"1  and  the  restraint  energy  fluctuates  around  11.7  Kcal  mol'1.  Several  conformational 
changes  occur  during  the  simulation  leading  to  different  conformations.  This  illustrates  one  of 
the  advantages  of  high  temperature  dynamics  simulations  for  conformational  space  searching. 
Indeed,  the  kinetic  energy  allows  the  molecule  to  cross  over  barriers. 

Etot  Epot 


Frame  number 


Fig.  3.  Variation  of  potential  and  total  energies  during  RMD  procedure. 

The  energies  are  given  in  Kcal  mol"1. 

Statistics  about  NOE  distance  contraints  for  RMDi  conformations  are  given  in  Table  2.  It  can 
be  seen  that,  only  8  distance  constraints  are  observed  outside  the  ranges  defined  for  the  target 
distances.  The  largest  average  difference  is  0.42  A  .  This  indicates  that  all  generated  structures 
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are  in  good  agreement  with  the  set  of  constraints. 


TABLE  2 

Statistics  on  distance  contraints  for  RMD  procedure. 


n 

Nout 

<dif> 

^max 

d<r> 

3 

90 

0.16 

0.27 

0.14 

9 

91 

0.21 

0.38 

0.18 

14 

51 

0.22 

0.91 

0.10 

15 

62 

0.15 

0.33 

- 

16 

85 

0.11 

0.15 

0.09 

20 

78 

0.16 

0.56 

0.02 

23 

100 

0.42 

0.61 

0.42 

25 

100 

0.08 

0.12 

0.08 

n  :  NOE  constraint  number. 

Nout  :  number  of  conformations  for  which  r  is  out  of  the  range  R  =  [r0-Ar ,  r0+Ar], 

<dif>:  average  of  the  differences  between  r  and  the  nearest  limit  of  R. 
dmax  :  largest  difference  between  r  and  the  nearest  limit  of  R. 
d<r>  :  difference  between  the  average  of  r  and  the  nearest  limit  of  R. 

Distance  values  are  given  in  A. 

The  set  of  100  RMDi  conformations  can  be  analysed  in  terms  of  rms  values.  A  rms  matrix 
was  produced  by  superimposing  each  conformation  RMDi  on  each  other.  All  the 
superimpositions  were  performed  by  taking  into  account  the  heavy  atoms  only.  In  order  to  avoid 
the  influence  of  the  isopropyl  group  orientation  on  the  rms  values,  the  two  terminal  carbon 
atoms  of  this  group  were  not  included.  The  largest  rms  difference  is  2.13  A  . 

In  order  to  determine  correlations  between  conformations,  a  rms  limit  criterion  can  be  used  to 
classify  them  into  families.  Taking  a  rms  limit  of  0.4  A,  which  is  a  reasonable  value,  we  could 
deduce  4  families.  Only  3  conformations  remained  not  correlated  to  the  others.  The 
conformational  change  which  occurs  from  one  family  to  another  is  consistent  with  the  variation 
of  energy  given  in  Fig.  3.  The  major  family  (FI)  comprises  58  conformations.  These 
conformations  happen  to  have  the  lowest  potential  energy. 

Finally,  one  conformation  per  family  was  selected  to  perform  the  subsequent  analysis  and 
calculations.  Their  characteristics  are  summarised  in  Table  3.  It  can  be  seen  that  conformations 
FI  and  F4  differ  only  by  a  rms  of  0.61  A.  However  F3  deviates  from  FI  and  F4  with  a  rms  of 
about  1.1  A  and  F2  is  even  more  different. 


Restrained  dynamics  at  300  K  :  FRMD 

Since  F4  is  close  to  FI,  the  FRMD  procedure  was  applied  only  on  FI,  F2  and  F3  structures. 
Therefore  3  sets  of  100  FjRMDi  conformations  were  analysed  using  energy  plots  and  rms 
matrix.  In  addition,  "cross-rms  matrices"  have  been  produced  by  superimposing  each 
conformation  of  Fj  family  on  each  conformation  of  Fk  family. 

It  can  be  observed  that,  starting  from  three  initial  conformations  which  have  a  rms  difference 
of  1.1  to  1.8  A,  after  about  10  ps,  all  the  FRMD  calculations  converge  to  the  same  Final  structure 
in  the  limit  of  0.5  A  for  the  rms  criterion.  It  should  be  noted  that  the  structures  are  not  minimised 
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during  these  procedures. 

At  last,  a  restrained  minimisation  procedure  was  applied  to  the  three  conformations 
F1RMD100,  F2RMD100  and  F3RMD100.  The  rms  differences  between  the  structures  are  very 
small  (0.2  and  0.3  A).  A  "cross-rms  matrix"  calculated  by  superimposing  the  initial 
conformations  F1,F2  and  F3  (before  FRMD  procedure)  and  the  final  conformations 
F1RMD100,F2RMD100  and  F3RMD100  (after  the  restrained  minimisation)  shows  that  the 
three  families  converge  to  a  conformation  which  is  close  to  the  major  family  FI  (table  4). 
Therefore  we  can  conclude  that  the  PIIA  molecule  adopts  the  conformation  FI  in  apolar  solvent. 


TABLE  3 

Characteristics  of  the  4  families. 


Fj 

RMDi 

Fpot 

Fres 

FI 

nns 

F2 

F3 

1 

35 

81.2 

11.6 

_ 

1.45 

1.14 

2 

12 

92.1 

10.4 

1.45 

. 

1.81 

3 

64 

90.7 

9.7 

1.14 

1.81 

. 

4 

50 

85.2 

11.3 

0.61 

1.62 

1.15 

j  :  family  number. 

RMDi  :  conformation  number  in  the  RMD  procedure. 

E^t  •  :  potential  energy. 

Eres  :  restraint  energy  (25  NOE  constraints). 

The  energies  are  given  in  Kcal  mol'1. 

The  rms  differences  are  given  in  A. 
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TABLE  4 

Characteristics  of  the  FjRMDIOO  conformations. 


FjRMDi 

Fpot 

Fjcs 

FI 

rms 

F2 

F3 

1  100 

79.4 

6.9 

0.69 

1.19 

1.57 

2100 

81.3 

6.9 

0.76 

1.22 

1.63 

3  100 

81.6 

6.7 

0.63 

1.26 

1.52 

FjRMDi  :  conformation  number  for  each  Fj  family  in  FRMD  procedure. 
Fj  :  initial  conformation  of  family  number  j. 

Epot  :  potential  energy. 

Ercs  :  restraint  energy  (17  NOE  constraints). 

The  energies  are  given  in  Kcal  mol"1. 

The  rms  differences  are  given  in  A. 


Comparison  of  the  solution  and  crystal  structures 

Comparing  the  conformation  FI  with  the  X-ray  structure  XRMIN,  we  show  that  both 
conformations  arc  very  close.  Indeed,  the  overall  rms  difference  for  the  heavy  atoms  of  the 
backbone  (without  the  two  carbon  atoms  of  the  isopropyl  group)  is  about  0.55  A.  The  largest 
deviations  occur  around  the  C5-C6  bond  and  around  the  hydroxyl  group  036.  This  is  consistent 
with  the  crystal  structure  features.  As  a  matter-of-fact,  in  the  crystal  structure,  the  hydroxyl 
group  is  engaged  in  an  intermolecular  hydrogen  bond  and  the  measured  length  of  the  bond 
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C5-C6  is  shorter  (1.14  A)  than  that  of  a  normal  covalent  bond.  This  last  observation  was 
interpreted  (ref.  4)  as  a  rocking  of  C5-C6  in  the  plane  defined  by  C4-C5-C6-C7. 

In  both  structures  XRMIN  and  FI,  the  orientation  of  the  C5-C6  bond  differs  by  about  180 
degrees.  So,  in  order  to  control  the  flexibility  of  PIIA  in  this  part  of  the  molecule,  a  dynamic  run 
of  50  ps  at  300K  (without  distance  constraints)  was  performed,  starting  from  the  XRMIN 
structure.  After  a  few  ps,  an  inversion  of  the  dihedral  angle  C4-C5-C6-C7  round  the  bond  C5-C6 
can  be  observed. 

Finally,  it  should  be  noted  that  the  intramolecular  hydrogen  bond  between  N8  and  038 
(H...0  =  2.1  A)  is  observed  both  in  solution  and  in  the  crystal.  This  hydrogen  bond  induces  a 
compact  and  stable  structure. 

CONCLUSIONS 

Starting  from  the  crystal  structure  and  from  25  proton/proton  distance  constraints  determined 
by  NMR  experiments,  and  using  restrained  molecular  dynamics  simulations  at  high  temperature, 
we  derived  three  families  of  conformations  which  satisfy  the  set  of  NOE  contraints.  Then, 
dynamics  simulations  at  300  K,  performed  for  one  conformation  in  each  family,  converged 
towards  an  unique  conformation.  This  conformation  is  very  close  to  that  observed  by  X-ray 
crystallography. 

Pristinamycin  IIA  is  a  good  test  for  the  refinement  of  our  strategy  in  molecular  modelling 
calculations  and  for  the  choice  of  the  force  fields,  since  various  experimental  informations  are 
available  concerning  this  molecule:  X-ray  structure,  NMR  data,  and  also  Laser  Raman 
spectroscopy  measurements  in  the  solid  state  (ref.  9). 

Therefore,  we  are  pursuing  the  present  work,  with  the  investigation  of  different  approaches 
allowing  to  get  more  accurate  values  of  the  force  field  parameters.  The  first  one  relies  on  "ab 
initio"  calculation  of  the  molecular  orbital  energy  surface,  using  a  method  developped  by  the 
Biosym  Force  Field  Consortium  (ref.  10).  The  second  one  is  based  on  the  calculation  of  the 
vibrational  frequencies  and  their  comparison  with  the  experimental  Raman  vibrational 
frequencies  (ref.  9). 
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ARTIFICIAL  INTELLIGENCE  IN  THE  MOLECULAR  WORLD 


M.C.  HATON 

CRIN/INRIA,  University  of  Nancy  I,  BP  239,  54506  Vandoeuvre-les-Nancy  (France) 


SUMMARY 

Various  aspects  of  Al  may  enter  the  molecular  world  :  knowledge  and  reasoning 
representation,  heuristic  problem  solving,  planning,  image  and  natural  language 
understanding,  knowledge-based  and  expert  systems,  "intelligent"  computer-aided 
instruction,  symbolic  learning.  These  aspects  will  be  discussed,  focusing  on  the 
specific  problems  the  chemist  or  physicochemist  has  to  face  with  (synthesis, 
experimental  planning,  structure  elicitation,  spectra  interpretation,  classification, 
information  retrieval,  fault  diagnosis  in  analytical  instrumentation,  student  education 
and  so  on).  Some  examples,  in  a  non  exhaustive  way,  will  be  given. 


INTRODUCTION 

While  solving  problems  is  a  long  tradition  in  experimental  sciences,  research  into 
how  people  actually  solve  that  kind  of  problems  is  relatively  recent.  It  was  supposed 
twenty  years  ago  that  people  knew  how  to  perform  problem  solving  and  overall  knew 
how  to  teach  their  solving  methods. 

Reflexion  has  been  recently  deepened  in  the  frame  of  cognitive  sciences,  nameiy 
by  knowledge  psychologists  and  now  by  knowledge  engineers  in  the  frame  of  a 
branch  of  advanced  information  processing  named  "Artificial  Intelligence"  (Al).  Al 
deals  with  these  activities  which  characterize  human  behaviour :  knowledge 
acquisition  and  structuration,  reasoning,  perception,  decision  making,  etc.  Al  methods, 
tending  to  formalize  human  knowledge  and  reasoning  processes,  give  us  new  tools 
for  a  better  understanding  of  that  processes. 

Dealing  with  molecules  is  a  complex  process,  which,  paradoxically,  is  nowadays 
complicated  by  the  fact  that  we  can  have  more  and  more  detailed  representations  and 
more  and  more  numerous  data  concerning  them.  While  the  use  of  computers  is 
increasing,  the  amount  and  the  complexity  of  the  data  involved  in  the  different 
domains  of  chemistry  make  it  imperative  to  discriminate  which  information  is  relevant 
and  to  consider  new  ways  of  processing  it.  The  chemist  is  often  able  to  extract  this 
relevant  information  and  to  make  decisions  thanks  to  his  knowledge,  his  experience, 
his  intuition  and  so  on. 

Al  may  help  the  specialists  in  these  tasks,  together  with  many  classical  methods 
of  data  processing. 
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WHAT  IS  ARTIFICIAL  INTELLIGENCE  ? 

A  definition 

Al  may  be  considered  along  two  complementary  axes  :  the  first  one  deals  with 
the  study  of  the  mechanisms  of  human  intelligence,  the  computer  being  used  as  a  tool 
of  simulation  to  test  a  model  or  a  theory,  the  second  one,  more  pragmatic,  is 
concerned  with  the  efforts  made  to  give  to  the  computer  some  abilities  usually 
associated  to  the  human  intelligence  (knowledge  acquisition  and  structuration, 
perception,  reasoning,  decision  making,  etc.).  Most  of  the  time,  this  second  point  of 
view  is  envisaged,  it  consists  in  emulating,  with  a  computer  program,  some  intelligent 
behaviours,  without  trying  to  reproduct  the  corresponding  functioning  of  the  human 
being.  In  that  way,  Al  appears  to  be  an  advanced  branch  of  computer  science,  but  in 
connection  with  many  other  domains. 

Al  is  presently  a  real  science,  both  for  academic  reasons  (several  thousand 
researchers  and  teachers  all  around  the  world)  and  economic  ones  (well  mastered 
applications,  specialized  companies,  important  research/development  projects)  (2). 

Main  characteristics 

Though  the  subjects  of  interest  are  various  and  numerous,  there  are  some 
common  features  among  the  Al  systems. 

At  first,  they  handle  symbolic  information  together  with  numbers.  This  type  of 
information  represent  concepts,  rules,  objects  or  facts  that  are  similar  to  those  used  by 
the  human  being  when  reasoning.  Non  classical  programming,  like  logic 
programming  with  PROLOG,  functional  programming  with  ML  or  LISP,  object-oriented 
languages  are  devoted  to  this  task.  Me  may  mention  too  the  new  trends  in  computer 
architectures  issued  from  the  study  of  human  memory  and  reasoning. 

Then,  they  use  heuristic  methods  which  can  be  opposed  to  the  classical 
algorithmic  ones.  Heuristics  allows  reaoning  in  a  non  deterministic  way,  the  success 
of  which  is  not  warranted,  but  which,  when  everything  is  all  right,  gives  a  good  solution 
to  the  problem  in  question  with  time  saving.  In  case  of  failure,  it  backtracks  and  tries 
another  solution.  The  heuristic  search  of  a  solution  often  consists  in  "pruning"  the  set 
of  the  possible  solving  paths  to  only  consider  the  best  promising  ones.  The  use  of 
heuristic  methods  allows  to  deal  with  two  types  of  problems  that  cannot  be  treated  by 
classical  methods  : 

-  the  problems  for  which  we  do  not  know  any  algorithmic  solution.  It  is  often  the 
case  inside  the  human  activities  like  perception,  decision  making,  design, ., 

-  the  problems  for  which  the  solving  complexity  increases  the  available 
computing  means.  A  typical  example  is  the  one  of  the  "chess"  game. 

Al  may  also  deal  with  missing  or  approximate  data.  It  corresponds  to  usual 
situations,  for  instance  in  medicine,  company  management,  and  so  on.  Even  if  it  does 
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not  entirely  solve  these  problems,  Al  provides  efficient  methods,  like  approximate  or 
non-monotonous  reasoning. 

Another  important  feature  of  Al  is  the  concept  of  knowledge.  By  opposition  with 
the  first  researchers  that  tried  to  develop  "general  problem  solvers”  {3},  Al  presently 
designs  systems  able  to  solve  problems  in  limited  domains,  taking  into  account  a 
great  amount  of  relative  knowledge.  Once  more,  this  feature  may  be  pointed  out  in  the 
human  activities  :  the  human  expertise  relies  on  that  type  of  knowledge  together  with 
experience,  intuition,  etc.  The  knowledge-based  systems  and  the  expert  systems  take 
a  large  place  inside  Al. 

At  last,  we  can  say  that  Al  needs  to  enter  various  domains,  besides  computer 
science  :  logic,  cognitive  psychology,  linguistics,  cybernetics,  neurosciences,  ... 

The  domains  of  Al 

Briefly,  we  mention  automatic  theorem  proving,  natural  language  processing, 
automatic  speech  understanding,  image  interpretation  and  computer  vision,  robotics, 
games  and  expert  systems. 

KNOWLEDGE  AND  REASONING  REPRESENTATION 
Introduction 

The  word  "knowledge"  includes  all  the  types  of  human  knowledge  :  objects  of 
the  real  world,  facts  and  events,  larger  concepts  corresponding  to  groupements  or 
generalizations  of  basic  ones,  relations  between  concepts,  heuristics  and  know-how 
strategies,  reasoning  procedures,  etc.  The  word  "metaknowledge"  relates  to  the 
confidence  that  we  associate  to  a  piece  of  knowledge. 

Knowledge  may  be  problem  specific,  domain  specific,  shallow,  deep,  exact, 
approximate, . 

Knowledge  representation  consists  in  translating  it  in  a  symbolic  form  that  may 
be  understood  by  a  reasoning  system. 

A  representation  mode  associates  a  data  structure  to  represent  the  useful 
information  and  the  procedure  of  manipulating  it.  The  reasoning  mechanism  allows  to 
dynamically  discover  new  pieces  of  information  or  knowledge  concerning  the 
problem.  This  feature  characterizes  a  knowledge  base,  compared  to  a  data  base  from 
which  it  is  only  possible  to  extract  pieces  of  information  that  have  been  explicitly 
introduced  in  it. 

One  opposes  sometimes  declarative  and  procedural  knowledge.  This  duality 
corresponds  to  the  distinction  between  a  descriptive  representation  ("what")  of  the 
knowledge  and  the  representation  of  know-how  ("how  to  do").  The  present  trend  is  to 
develop  mixed  representation  modes. 
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The  knowledge  representation  techniques 

They  can  be  divided  in  several  categories  : 

-  the  logical  representations  :  they  rely  upon  the  proposition  calculus  (0-order 
logic)  or  the  first  order  predicate  calculus  (1 -order  logic).  The  way  of  reasoning 
consists  in  deriving  new  formulas  from  the  initial  ones.  For  that  purpose,  a  set  of 
logical  rules  are  available,  like  the  "resolution"  rule.  Moreover,  there  exists  non 
standard  logics  allowing  to  deal  with  ambiguous  or  fuzzy  knowledge, 

-  the  semantic  networks,  at  first  used  for  the  interpretation  of  natural  language. 
They  are  graphs  in  which  the  nodes  represent  concepts  and  the  arcs  represent 
semantic  relations  between  these  concepts.  The  way  of  reasoning  is  to  match  the 
network  representing  the  problem  with  the  knowledge  network  thanks  to  a  pattern 
matching  procedure.  The  matching  is  rarely  immediate  and  often  needs  what  is  called 
"the  inheritance  of  properties  from  a  general  concept  to  a  more  partciular  one, 

-  the  production  rules  which  are  pieces  of  knowledge  in  the  form  : 

if  conditions  then  conclusions  and/or  actions  (coefficient ), 

which  means  that,  if  the  premises  (condition  side)  are  valid,  it  is  possible  to  draw 
some  conclusions,  start  an  action,  fire  a  computing  or  display  procedure,  etc.  The 
coefficient,  if  it  exists,  indicates  the  confidence  about  the  rule  or  its  likelihood.  It  allows 
to  manage  uncertainty  and  comes  generally  from  the  experience  of  the  expert. 

The  reasoning  process  is  done  by  an  interpreter  in  two  possible  ways  : 

.  the  forward  or  data-driven  chaining,  considering  the  rules  from  the  left  to  the 
right  and  drawing  conclusions  from  the  current  data, 

.  the  backward  or  goal-driven  chaining,  which  consists,  while  trying  to  prove  the 
right  part  of  a  rule,  to  prove  the  elements  of  the  left  side. 

These  two  chainings  may  be  alternately  performed  in  a  mixed  approach,  well 
adapted  to  reproduce  the  human  reasoning  based  at  first  upon  data  and  then  directed 
by  hypotheses, 

-  procedural  representations.  They  allow  to  introduce  classical  algorithmic 
procedures  inside  Al  systems, 

-  the  object-oriented  representations.  They  come  from  the  necessity  to  organize 
the  set  of  available  knowledge.  Various  formalisms  have  been  developed  (frames  and 
scripts,  prototypes, ...)  that  led  to  the  concept  of  structured  objects.  These  objects  are 
now  the  basis  of  object-oriented  programming  languages. 

HEURISTIC  PROBLEM  SOLVING  -  PLANNING 

It  is  now  accepted  that  the  memorization  and  computing  abilities  of  a  computer 
are  not  sufficient  to  solve  complex  problems.  In  the  case  of  complex  games  needing 
strategies,  of  planning  (determination  of  the  sequence  of  actions  to  be  performed  to' 
reach  a  prerequisite  goal),  etc.,  it  is  necessary  to  use  specific  techniques  depending 
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on  different  factors  :  the  best  solution  is  needed,  backtracking  is  possible,  we  can 
forecast  the  effect  of  an  action,  the  problem  may  be  divided  into  sub-problems  easier 
to  solve  or  not,  and  so  on.  Different  modes  are  available  to  represent  the  process  of 
problem  solving  (state  space  trees,  and-or  trees).  The  virtual  resolution  trees  are  not 
entirely  developed,  they  generally  are  enormous.  Heuristics  intervene  to  develop  at 
best  the  branches  that  have  some  chance  to  lead  to  a  good  solution. 

IMAGE  INTERPRETATION 

Computer  vision  aims  at  the  interpretation  of  an  image  or  a  scene  and  the 
matching  of  this  interpretation  with  an  action  upon  the  environment.  A  vision  system 
includes  modules  performing  image  processing  and  pattern  recognition.  It  is  based  on 
two  main  ideas  :  the  need  of  new  architecture  for  parallel  computation,  the  existence 
of  a  hierarchy  of  representations,  since  the  physical  signals  up  to  their  cognitive 
interpretation. 

NATURAL  LANGUAGE  (NL)  UNDERSTANDING 

Natural  language,  like  french  or  english  languages,  are  preference  means  of 
interaction  between  men  and  machines.  Since  the  access  to  data  inside  a  database 
only  requires  a  formal  artificial  language,  to  be  able  to  interact  with  the  machine  (to 
ask  a  question  or  to  receive  an  answer  or  a  reformed  text)  in  a  natural  way  constitutes 
an  important  step.  NL  understanding  concerns  both  written  and  oral  languages. 

KNOWLEDGE-BASED  AND  EXPERT  SYSTEMS 
iQlrQdUgJiQI) 

The  current  success  of  expert  systems  (ES)  mainly  originates  from  their  ability  to 
carry  out  such  tasks  as  diagnosis  and  decision  making,  which  were  little 
computerized. 

The  advent  of  ES  in  the  early  70s  corresponds  to  a  global  evolution  of  Artificial 
Intelligence  from  the  search  for  general  problem  solving  techniques  to  the  study  of  the 
mechanisms  used  by  a  human  expert  to  solve  one  problem  in  a  given  field.  A 
pioneering  project  was  the  DENDRAL  project,  launched  in  1965  by  E.  Feigenbaum 
and  his  colleagues  at  the  University  of  Stanford.  DENDRAL  helps  the  chemist  to  infer 
the  structure  of  an  organic  compound  from  its  mass  spectrogram  and  its  rough 
chemical  formula  (4). 

This  project  dearly  showed  the  interest  of  separating  the  knowledge  necessary 
for  solving  a  problem  in  a  given  field  (e.g.  organic  chemistry)  from  the  general 
mechanisms  of  knowledge  handling.  The  idea  of  knowledge-based  systems  lies 
precisely  in  this  separation.  Several  projects  then  completed  this  basic  idea  :  MYCIN 
(Shortliffe,  1976,  at  the  University  of  Stanford  {5})  for  the  diagnosis  of  some  viral  blood 
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diseases,  which  greatly  influenced  the  design  of  a  large  number  of  ES,  MOLGEN 
(Stefik  at  the  University  of  Stanford)  {6}  for  the  planning  of  experiments  in  molecular 
genetics,  or  PROSPECTOR  (Duda  and  Hart,  1979,  at  the  Stanford  Research  Institute) 
in  the  field  of  geology  {7}. 

Since  that  beginning,  a  large  number  of  ES  have  been  developed  and  routinely 
used.  An  expert  system  is  made  up  of  a  set  of  computer  programs  that  reach  the 
performances  of  a  human  expert  for  a  precise  task  in  a  restricted  domain  through  the 
use  of  a  knowledge  base  extracted  from  experts  of  the  domain. 

Second  generation  ES,  based  on  a  deep  knowledge  about  the  application  field 
are  now  appearing. 

Basic  architecture  of  an  expert  system 

An  ES  is  a  particular  case  of  knowledge-based  system.  It  basically  includes  three 
parts  : 

-  one  or  several  knowledge  bases  which  constitute  the  long  term  memory  of  the 
system  and  which  store,  possibly  with  several  formalisms,  as  seen  previously,  the 
various  pieces  of  knowledge  necessary  for  solving  a  problem  :  permanent  facts, 
know-how,  rules  of  thumb,  common-sense  knowledge,  etc.  The  formalism  of 
production  rules  has  been  widely  used  in  ES  since  they  are  well  suited  for 
representing  the  way  in  which  an  expert  solves  a  problem,  especially  in  domains  like 
diagnosis, 

-  a  fact  base  which  contains  the  data  and  facts  related  to  a  problem  to  be  solved. 
It  constitutes  the  short  term  working  memory  of  the  ES, 

-  an  inference  engine,  or  interpreter,  in  charge  of  reasoning  about  a  problem  by 
exploiting  the  facts  and  the  knowledge  available. 

An  inference  engine  works  according  to  a  particular  search  strategy.  For  simple 
cases,  the  strategy  may  only  consist  in  exhaustively  exploring  all  potential  solutions.  It 
is  however  usually  necessary  to  design  more  sophisticated  mechanisms,  eventually 
based  on  meta-knowledge  (such  as  the  metarules  used  in  MYCIN),  in  order  to  reach  a 
solution  in  a  reasonable  time. 

The  explicit  separation  between  the  knowledge  base  and  the  inference  engine  is 
useful  to  implement  practical  knowledge-based  applications  (especially  for  the 
incremental  design  of  large  systems,  the  relative  ease  of  updating  and  maintaining 
knowledge  bases,  etc.). 

Most  present  systems  use  production  rules  for  coding  partly  or  entirely  their 
knowledge  bases.  That  offers  several  advantages  : 

-  readability  of  knowledge, 

-  ease  of  building  up  the  base  since  production  rules  are  independent 

-  possibility  for  the  system  to  explain  its  reasoning. 
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Object-oriented  languages  offer  now  a  promising  way  of  coding  Knowledge  and 
incorporated  reasoning  mechanisms. 

Typology  of  problems 

ES  can  help  solving  problems  with  the  following  characteristics  : 

-  a  large  body  of  knowledge,  possibly  incomplete  and  uncertain,  is  necessary  for 
solving  a  problem, 

-  the  application  field  may  evolve  in  time, 

-  the  resolution  of  a  problem  is  basically  heuristic, 

-  symbolic  processing  plays  a  major  role  in  the  problem  solving  process. 
Problems  tackled  by  ES  can  be  divided  into  three  main  classes  : 

-  data  interpretation  :  the  problem  consists  in  examining  a  set  of  data  and/or 
physical  signals  in  order  to  interpret  them.  Typical  examples  are  diagnosis  (in 
medicine,  industry,  finance  and  banking,  etc.)  and  signal  and  image  interpretation  (in 
biology,  physicochemistry  ana  medicine,  acoustics,  industry,  etc.), 

-  planning  :  this  domain  concerns  complex  tasks  that  can  be  found  in  activities 
like  decision  making,  administration  and  law,  resource  management,  production 
management,  robotics  and  so  on, 

-  design  :  of  objects  (e.g.  VLSI  chips)  according  to  preliminary  specifications. 
Finally  ES  are  also  used  in  computer-aided  instruction  (CAI)  due  to  their  ability  of 

explicitating  and  explaining  their  reasoning  schemes.  Applications  exist  in  several 
domains  :  medicine  (learning  how  to  give  a  diagnosis),  industry  (teaching  instructions 
to  process  controllers),  academic  teaching  (for  mathematics,  physics,  foreign 
language,  etc.). 

Domains  of  application 

The  different  categories  of  ES  that  have  just  been  presented  can  be  found  in  very 
various  domains  of  application.  The  following  list  is  not  exhaustive  and  includes 
systems  at  various  stages  of  development  (first  demonstrators,  prototypes,  operational 
products)  :  agriculture,  avionics  and  space,  bank,  finance  and  insurance, 
biotechnologies,  chemistry,  computer  science,  electronics,  geology,  industry  ,  law  and 
regulation,  mathematics,  medicine,  military,  teaching  and  so  on. 

Development  of  an  expert  system 

The  decision  of  launching  the  development  of  an  ES  must  be  taken  carefully  by 
taking  into  account  several  criteria  : 

-  the  problem  to  be  solved  necessitates  to  use  not  only  quantitative  but  also 
qualitative  information, 


734 


-  there  exist  known  experts  of  the  domain  who  are  motivated  and,  if  possible, 
available, 

-  the  problem  is  of  reasonable  complexity  and  has  no  satisfactory  algorithmic 
solution. 

The  justification  of  an  ES  on  purely  economic  grounds  is  not  easy  since  the 
return  on  investment  is  usually  difficult  to  evaluate.  But  other  factors  have  also  an 
importance  : 

-  the  expertise  is  scarce  or  fragil, 

-  the  expertise  is  concentrated  at  one  place  but  used  at  several  locations, 

-  decisions  must  be  taken  under  stressing  conditions,  ... 

In  the  present  state  of  Al  methodology,  the  development  of  an  ES  is  based  on  the 
interaction  between  two  (or  groups  of)  main  actors  : 

-  an  expert  of  the  application  domain, 

-  a  knowledge  engineer. 

These  two  actors  will  closely  cooperate  for  building  up  the  knowledge  base  of  the 
ES.  That  constitutes  the  most  crucial  step  in  the  development  of  an  ES.  It  is  likely  that 
forthcoming  tools  from  the  research  field  of  symbolic  learning  will  facilitate  the 
acquisition  of  knowledge  from  experts  and  its  formalization  {1}. 

ES  are  software  programs  with  a  specific  life  cycle.  The  development  of  an  ES  is 
typically  made  up  of  four  successive  phases  {8} : 

-  design  of  a  first  demonstrator :  in  this  preliminary  phase  only  a  subproblem  in 
the  framework  of  the  application  is  selected.  The  goal  is  only  to  demonstrate  the 
validity  of  an  ES  solution  and  to  evaluate  the  overall  complexity  and  cost  of  the  total 
project, 

-  development  of  a  prototype  able  to  take  into  account  the  entirety  of  the 
problem, 

-  integration  of  the  final  product  in  the  environment  in  which  it  will  be  used.  This 
phase  raises  complex  problems  related  to  classical  computer  science  more  than  to 
Al :  connection  with  other  software  packages  and/or  databases,  interface  with  the 
users,  real  time  aspects,  etc., 

-  exploitation  of  the  ES  :  knowledge  in  a  domain  usually  evolves  when  time 
goes,  the  maintenance  and  updating  of  the  knowledge  base  is  thus  a  vital  operation  in 
the  life  of  an  ES. 

Development  tools 

The  functionnalities  of  an  ES  can  be  grouped  into  five  categories  : 

-  problem  solving  which  represents  the  basic  activity  of  the  ES,  carried  out  by 
one  or  several  inference  engines  in  relation  with  knowledge  base(s), 

-  acquisition,  modification  and  updating  of  knowledge, 
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-  explanation  on  how  a  problem  was  solved  by  the  ES.  This  function  is 
fundamental  if  the  ES  is  used  in  education,  it  is  also  very  useful  for  the  expert  and  the 
knowledge  engineer  to  test  the  validity  of  the  knowledge  base  during  its  design, 

-  connection  of  the  ES  with  external  systems  :  computer  programs,  data  bases, 
eventually  other  ES, 

-  interface  with  the  end  user.  This  is  an  important  point  for  a  system  to  be 
accepted  by  the  users.  It  includes  various  aspects  :  graphic  displays,  images,  natural 
language  processing,  etc. 

The  complexity  of  an  ES  makes  it  often  necessary  to  develop  it  by  using 
sophisticated  software  tools  rather  than  writing  it  from  scratch  in  programming 
languages.  There  exists  a  large  variety  of  such  tools  commercially  available  for 
different  classes  of  machines  :  microcomputers,  workstations,  symbolic  machines  and 
mainframe  computers. 

Conclusion 

The  use  of  ES  is  rapidly  increasing  in  a  large  variety  of  domains  since  they  bring 
solutions  to  get  unsolved  problems.  However  these  systems  still  present  strong 
limitations  : 

-  the  learning  abilities  are  modest  (the  knowledge  base  is  built  up  a  priori  and 
cannot  be  dynamically  updated  by  the  ES), 

-  the  size  of  the  domain  covered  by  the  expertise  of  an  ES  is  restricted, 

-  ES  mainly  use  the  surface  knowledge  of  a  domain  instead  of  refering  to  the 
deep  knowledge  about  underlying  phenomena, 

-  reasoning  schemes  are  too  scarce  and  limited. 

Some  of  these  limitations  are  tending  to  be  overcome  in  second  generation  ES 
which  are  now  appearing.  These  systems  are  characterized  by  some  new 
tendancies  : 

-  multimodal  representation  of  knowledge,  especially  in  the  framework  of  object 
oriented  representations, 

-  use  of  deep  knowledge  in  conjunction  with  a  qualitative  modelling  of  the 
phenomena, 

-  better  integration  of  the  ES  in  existing  information  systems, 

-  simultaneous  use  of  several  Al  techniques  :  natural  language  processing, 
vision,  planning,  etc. 

Al  IN  THE  MOLECULAR  WORLD 

From  what  has  been  developed  before,  it  is  easy  to  imagine  the  numerous 
applications  of  Al  techniques  inside  the  molecular  world  {S}.  We  can  try  to  classify 
them  and  to  give  some  (non  exhaustive)  examples  in  such  a  way  : 
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Programming  languages  and  aids 

Programming  languages  (object-oriented  languages,  LISP,  PROLOG,  etc.) 
allows  to  translate  general  and  heuristic  knowledge  in  an  easier  way  than  other 
procedural  languages.  Moreover,  we  can  find  shells  designed  for  building  systems 
entering  specific  categories. 

Knowledge  representation  -  Problem  solving 

As  seen  before,  knowledge  representation  is  present  in  every  realization.  The 
various  possible  modes  help  to  make  the  best  choice  according  to  the  problem. 

Heuristic  state-space  search  techniques  may  be  used  for  synthesis  planning 
programs  :  {10},  {11},  {12},  {13} . structure  elicitation  :  {14},  {15},  {4},  ... 

Let  us  take  the  example  of  synthesis  :  a  given  state  corresponds  to  a  molecular 
structure  while  operators  describe  chemical  transforms.  To  find  a  synthesis  path  is 
equivalent  to  find  and  reverse  the  "best"  sequence  of  operators  starting  from  the 
compound  to  be  synthesized  (initial  state)  and  leading  to  the  necessary  basic 
chemical  products. 

Exhaustive  search  inside  the  resolution  tree  is  impossible,  so  it  is  necessary  to 
introduce  heuristics  (the  technique  of  "best  first  search"  with  the  computation  of  an 
evaluation  function  for  measuring  structural  complexity  is  performed  for  example  in 
SYNCHEM  {10}).  In  some  implementations  {11},  it  has  been  decided  to  give  an 
important  role  to  the  chemist  using  the  system  :  an  interaction  mode  allows  to  take 
into  account  the  advice  of  the  user  to  cut  non  productive  branches  and  then  to  perform 
an  automatic  learning  procedure  to  make  the  system  better. 

Learning 

Automatic  learning  is  relative  to  the  automatic  building  by  the  system  of  some 
knowledge  which  has  not  been  explicitly  introduced.  For  example,  training  concerning 
biological  sequences  (training  of  filters  isolating  shorter  codes  assembly,  training  of 
concepts,  etc.)  {16}. 

It  is  remarkable  that  the  first  developed  ES,  DENDRAL,  which  has  been  already 
mentioned,  included  an  automatic  learning  module,  MetaDENDRAL.  To  this  module 
were  presented  some  couples  "problem-solution"  (here,  rough  formula  and  mass 
spectrum  on  one  side  and  the  developed  formula  on  the  other  side)  and  new 
reasoning  spectrum  interpretation  rules  have  automatically  been  found,  even,  they 
say,  concerning  non  elicitated  structures. 

Learning  may  play  a  role  in  ES  designed  for  data  interpretation  {17},  {18}, 
molecular  synthesis  {10},  {11}  or  search  of  links  between  structure  and  activity  of  a 
substance. 
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Structure  elicitation 

It  concerns  the  determination  of  the  composition  and/or  3-dimensional  structure 
of  molecules.  The  first  operational  ES,  DENDRAL  {4}  had  been  designed  for  finding 
the  structure  of  chemical  substances  from  their  rough  formula  and  their  mass 
spectrum. 

The  strategy  is  a  three-phase  process  : 

.  a  planning  phase  :  structural  constraints  are  inferred  from  available  chemical 
and  spectral  data, 

.  an  algorithmic  generation  phase  :  all  stereo-isomers  compatible  with  the  given 
constraints  are  generated, 

.  a  test  phase  :  prediction  of  properties  for  each  generated  candidate  (thanks 
todetailed  structural/spectral  relationships)  and  comparison  of  predicted  properties 
and  observed  data  to  derive  a  plausibility  score.  And  also  simulations  of  experiments 
to  differentiate  among  the  remaining  candidates. 

Al  techniques  are  applicable  in  the  first  and  third  phases. 

A  similar  approach  has  been  applied  to  DNA  strands  {19}. 

Other  systems  have  been  developed  for  mass,  NMR  or  laser  spectrography  {20}, 
{21},  the  latter  allowing  to  determine  the  composition  of  a  mineral  or  organic 
substance  from  the  study  of  its  spectrum  obtained  by  laser  spectrography. 

Stiucture-activitv  relationship 

Some  systems  handle  structural  information,  arising  from  molecular  modelling, 
for  product  activity  prediction.  {22},  for  example,  includes  a  module  for  extracting  a 
quantitative  estimate  of  the  activity  of  a  molecule  and  a  training  module  to  refine  these 
types  of  criterion  each  time  a  new  molecule  is  studied. 

Classification  of  molecules 

In  this  domain,  Al  provides  techniques  for  easy  symbolic  handling  thanks  to 
adequate  languages,  especially  PROLOG,  LISP  and,  more  and  more,  the  object- 
oriented  languages,  the  prototype  of  which  is  Smalltalk.  The  last  ones  give  a  good 
support  for  complex  object  representation.  And  the  introduction  of  ES,  including 
problem  specific  heuristics  decreases  the  complexity  of  the  classification  process.  So, 
we  can  find  a  mixed  knowledge  representation  mode,  compound  with  both  objects 
and  production  rules.  Moreover,  the  classification  process  may  involve  approximate 
reasoning,  by  the  way  of  theories  like  the  ones  of  the  possibilities  {23}  or  of  the  fuzzy 
sets  {24}. 

A  demonstrator  aiming  at  classifying  the  photosynthesis  inhibitors  has  been 
developed  {25}.  The  objects  are  defined  in  terns  of  some  structural  elements 
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(presence  and  relative  positions  of  functional  groups  and  so  on)  and  the  reasoning 
rules  deal  with  this  topological  information  to  predict  the  inhibitory  properties  of  the 
molecule. 

Chemical  synthesis 

We  already  mentioned  some  systems  designed  for  chemical  synthesis.  They  are 
direct  applications  of  heuristic  search.  Various  approaches  (26),  {10},  {27},  {12},  {13}, 
{28},  {11}  have  been  studied.  Such  systems  generally  includes  multimodal 
representation  of  general  chemical  knowledge  and  heuristics. 

Experiment  olannino 

Such  techniques  may  concern  the  planning  of  experiments  in  molecular 
biology  :  MOLGEN  {6}  has  been  the  first  ES  including  reasoning  based  on 
hierarchical  planning,  laying  upon  three  knowledge  levels  :  strategies,  design  and 
elementary  actions.  A  mechanism  for  constraint  managing  acts  for  eliminating  the 
unconsistent  solutions. 

Intelligent  instrumentation  and  robotics 

They  allow  the  extension  of  the  range  of  automated  analytical  procedures. 

The  evolution  of  VLSI  integrated  circuits  leads  to  envisage  expert  systems  to 
increase  the  efficiency  of  the  interfaces  with  and  the  functioning  of  various  types  of 
instruments.  The  applications  may  be  concerned  with,  for  example  : 

-  the  introduction  of  expert  systems  acting  directly  when  the  information  is 
captured  (spectra  interpretation,  for  instance), 

-  the  aid  for  the  processing  and  interpretation  like  the  ones  of  electrophoresis 
gels  {29}.  This  technique  is  presently  considerably  increasing  together  with  the  study 
and  the  production  of  proteins.  The  automatization  of  this  technique  would  be  an 
important  step, 

-  the  aid  for  protein  ingeneering  :  crystallization  or  purification  techniques  in 
which  the  use  of  ES  may  intervene  together  with  the  automatization  of  the  processes, 

-  the  quality  control :  at  the  different  steps  of  the  production  process  (genetic 
control,  microbiology,  biochemistry, ...).  One  finds  again  the  importance  of  image 
processing  and  of  data  interpretation  associated  to  sophisticated  techniques  of 
reasoning. 

Fault  diagnosis  in  chemical  analytical  instrumentation 

Dealing  with  troubleshooting  of  a  complex  installation  shows  that  it  needs  much 
knowledge  and  much  practice.  Moreover,  the  number  of  failure  causes  may  be  very 
high. 
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Expert  systems  may  give  an  answer  to  this  problem  by  allowing  the  centralization 
and  a  redistribution  of  the  knowledge,  proposing  a  rigourous  method  for  solving  a 
problem  and  leading  to  a  more  efficient  and  rapid  repair. 

Information  entry 

Rule-based  systems  may  be  used  for  the  syntactic  control  of  data  entered  by  the 
user  of  a  graphic  system,  like  the  validity  of  molecules. 

Oral  language  processing 

Oral  language  may  be  in  certain  situations  a  convenient  way  of  entering  data,  for 
example  biological  sequences  like  DNA  strands  for  further  automatic  analysis. 
Common  use  of  speech  is  still  a  domain  of  research. 

Information  retrieval  and  processing  NL 

These  techniques  allow  information  retrieval  systems  being  used  in  a  "friendly" 
way  without  a  specified  query  language. 

Molecular  graphics 

The  graphic  presentation  on  computer  screens  of  the  shape  of  active  sites  inside 
molecules  make  it  possible  a  new  molecular  approach.  Rule-based  systems  are  able 
to  give  the  optimal  representation,  to  directly  modify  the  molecule  configuration  (and 
then  compute  its  energy),  to  superpose  at  best  this  molecule  with  other  rigid  or  smooth 
modes,  and  so  on.  The  results  of  the  computational  methods  and  of  the  visual 
observation  may  lead  the  specialist  to  a  better  understanding  of  some  properties  and 
to  modelize  this  knowledge  that  can  be  included  to  enrich  an  ES  dealing  with  the 
problem  in  question.  The  applications  in  the  domain  of  the  pharmacology,  for 
example,  are  numerous,  especially  toward  a  better  understanding  of  the  central 
nervous  system  {30},  or  aiming  at  modelling  the  reactivity  of  molecules 
(heterogeneous  kinetics),  etc.  (31). 

Intelligent  tutoring  systems 

The  generalized  declarative  way  of  representing  knowledge  and  the  explanation 
abilities  of  ES  allows  to  build  sytems  design  for  "intelligent"  computer-aided  instruction 
(ICAI),  and  here,  a  chemical  education  {28}.  These  systems  are  generally  compound 
with  four  parts  (not  always  clearly  separated) :  a  domain  knowledge  module,  a 
pedagogical  knowledge  module,  a  student  model  or  "profile"  and  the  interface  with  the 
user. 
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DISCUSSION 


S  O  U  L I E  -  Vous  avez  dit  au  cours  de  votre  conference  que  les  capcites 
d'apprentissage  des  systemes-experts  etaient  modestes.  Pouvez-vous  expliquer  et 
commenter  ce  point  par  des  exemples  sur  ces  systemes-experts  se  rapportant  au 
domaine  de  la  chimie  ? 

HATTON  -  Presently,  expert  systems  can  increase  their  own  ability  to  problem  solving 
only  by  simple  learning  mechanisms,  generally  involving  logical  reasoning. 

It  must  be  noticed  that  the  first  expert  system,  DENDRAL,  designed  for  structure 
elicitation,  included  a  learning  module,  named  METADENDRAL.  From  a  collection  of 
examples  (data  =  rough  formula  and  mass  spectrogram  of  an  organic  molecule, 
solution  =  developed  formula),  it  has  been  able  to  infer  new  rules  for  the  interpretation 
of  mass  spectrograms.? 

Automatic  learning  is  still  a  research  topic.  However,  some  applications  are  appearing 
in  some  domains  in  which  the  use  of  automatic  systems  implies  a  strong  interaction 
with  the  user  who  confirms  or  rejects  the  solution  proposed  by  the  system  (synthesis  of 
organic  molecules,  for  instance). 


DEVILLERS  -  My  question  is  somewhat  related  to  theology.  Do  you  need  in 
expert-systems  to  explicity  introduce  non-contradiction  principle  (or  rule)  or  is  it 
self-contained  ? 

HATON  -  When  an  expert  system  is  being  used,  reasoning  rules  (especially  logical 
ones)  are  used  to  infer  new  information  concerning  the  problem  in  question  and  to 
draw  conclusions,  make  a  decision,  etc. 

The  consistency  of  the  knowledge  base  must  be  warranted  at  the  building  of  the 
system.  Actually,  it  is  known  that  "from  wrong  premises,  anything  may  be  inferred". 
There  presently  exists  some  tools  to  help  the  knowledge  engineer  to  detect  a  possible 
inconsistency  within  the  knowledge  base. 
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SUMMARY 

As  an  initiation  to  the  methods  of  buiding  an  expert  system, 
a  program  which  studies  molecules  depicted  as  molecular  graphs 
is  presented.  The  expert  systems  development  tool  (SPIRAL) 
encompasses  the  PROLOG  language,  with  an  object  formalism.  The 
molecules  are  described  as  objects  with  two  main  slots,  their 
labelled  atoms  and  the  types  of  bonds  between  them. 

INTRODUCTION 

More  and  more  effort  is  being  made  to  predict  physical, 
chemical  and  biological  properties  of  molecules,  as  exemplified 
by  the  development  of  quantitative  structure-activity 
relationship  (QSAR)  studies  (ref.  1)  .  The  problem  arises  then 
to  manipulate  and  extract  molecular  structures  and  information 
contained  in  data  bases.  We  focus  here  on  the  topology 
determined  by  the  description  of  the  graph  of  a  molecule. 

We  demonstrate  how  an  declarative  language  can  be  used  to 
give  information  pertaining  to  such  QSAR  studies:  detection  of 
given  substructures,  enumeration  of  subgraphs  and  associated 
calculations  such  as  molecular  connectivity  index 
determinations.  The  listing  of  the  examples  can  be  obtained 
from  the  author  upon  request . 

LOGIC  PROGRAMMING  LANGUAGE 

The  expert  systems  development  tool  used  was  SPIRAL.  This 
tool  was  developed  by  Dr.  Yves  SOUCHET  (CEA,  Commissariat  A 
l'Energie  Atomique,  Service  de  MathAmatiques  AppliquAes) . 
Programmed  in  C,  it  encompasses  the  PROLOG  declarative  language 
with  improvements  (first-order  logic  formalism)  and,  in 
addition,  supplies  an  object  formalism  (with  daemons  and 
consistency  checking) .  Thus  it  is  possible  to  define  prototypes 
(classes)  and  class  instances  (objects) .  The  classes  have 
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|  system-  and  user-defined  slots  with  simple  values  inheritance. 

|  A  version  is  adapted  to  the  PC  compatible  micro-computers, 

:  whereas  the  full  version  is  mainly  developed  for  UNIX 

f 

\  workstations  (with  graphics,  input  and  output  primitives)  . 

f  Procedural  subroutines  can  be  called  from  within  the  program. 

s 

•  PROGRAM 

Description  of  the  molecular  structures. 

A  molecular  structure  is  defined  as  an  object.  It  can  be 
described  directly  by  the  user  when  running  the  programm,  as 
objects,  facts  and  inference  rules  can  be  added  (or  subtracted) 
interactively.  This  is  obviously  tedious,  and  it  is  preferable 
to  prepare  an  editable  data  base  which  can  be  further 
structured  and,  if  necessary,  partioned,  to  avoid  memory  size 
limitations . 

Basically,  a  molecule  is  described  by  the  sequence  of  its 
atoms,  their  labelling  (to  distinguish  identical  atoms)  and  the 
list  of  the  type  of  the  bonds  between  them  (Figs.  1,2).  The 
labels  of  the  atoms  are  chosen  by  the  user.  This  corresponds  to 
the  vertices  and  the  edges  of  the  graph  representing  a 
,  molecular  structure  (refs.  2-4) .  These  two  main  slots,  atoms 

and  bonds,  are  lists  of  lists.  The  elements  of  the  first  list 
I  contain  an  atom  name  and  its  label.  The  elements  of  the  second 

;  list  contain  a  bond  type  and  the  two  labels  corresponding  to 

t  the  bonded  atoms. 


I  I 

H  C  H 

\  /  \  / 

N  N 

!  I 

H  H 


b 

I  I 

1  a  3 

\  /  \  / 

c  d 

1  I 

2  4 


Fig.  1.  Example  of  urea.  Scheme  of  the  molecule. 
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def_classe (molecule, ob jet, (atoms, liste) , (bonds, liste) ; 
def_instance (urea, molecule 

, (atoms, (C, a) , (0,b), (N,c), (N,d), (H,l), (H,2), (H,3), (H,4)) 

,  (bonds,  (D,a,b) ,  (S,a,c) ,  (S,a,d) 

,  (S,  c,  1) ,  (S,  c,  2) ,  (S,  d,  3) ,  (S,  d,  4) ) ) 

Fig.  2.  Instance  definition  for  the  urea  molecule.  S:  single; 
D :  double  bonds . 

Additional,  optional  slots  may  contain  the  name(s)  of  the 
file(s)  for  the  bibliographic  reference  list,  the  physical 
data,  the  scheme  of  the  molecule  which  can  be  displayed  on  the 
screen,  the  screen  coordinates  of  the  atoms  in  order  to 
highlight  some  particular  atoms  following  a  query,  and  so  on. 
Defined  atoms  can  be  functional  groups  or  even  molecular 
substructures,  depending  on  the  chosen  level  of  description. 
Groups  and  substructures  can  in  turn  be  defined  as  objects;  the 
description  of  the  molecule  is  then  shortened  and  more  user- 
readable,  without  losing  its  full  content  if  needed  by  later 
treatments.  Hydrogen  atoms  can  be  suppressed  at  will,  as 
retrieving  them  is  straightforward,  as  the  bond  orders  and  the 
connected  atoms  are  known.  Aliases  can  be  given  to  a  molecule 
or  a  group  name. 

Single  molecule  queries. 

Logic  programming  easily  allows  inter  alia  to  find  out  the 
shortest  distance  between  atoms  (in  terms  of  number  of  bonds) , 
the  presence  of  cycles,  the  molecular  connectivity  indexes 
(refs.  5-7)  after  having  enumerated  the  appropriate  subgraphs 
(paths,  clusters,...).  The  queries  can  be  logically  mixed 
together.  They  are  derived  from  basic  inference  rules  which 
manipulate  the  slots  of  the  molecule  object.  Due  to  the 
fundamental  properties  (backtracking)  of  the  declarative 
language,  the  knowledge  base  can  be  automatically  scanned. 

Two  molecule  queries. 

As  noted  above,  each  molecule  has  its  own  atom  labels. 
Consequently,  in  order  to  be  able  to  directly  compare  two 
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molecules,  these  labels  are  substituted  by  numbers.  The 
sequence  numbers  can  be  manipulated  by  permutating,  extracting, 
sorting,  etc.  Moreover,  identical  answers  are  usually 
automatically  eliminated.  Thus  particular  substructures,  or 
molecules  embedded  in  others  may  be  located. 

CONCLUSION 

Applied  to  molecular  graphs,  the  use  of  a  logic  programming 
language  renders  enumeration  tasks  particularly  suitable.  In  a 
given  program  one  can  easily  extend  both  the  molecular  data 
base  and  the  set  of  queries  it  is  able  to  answer,  with  the  goal 
of  an  artificial  intelligence  approach  to  structure-activity 
studies . 


REFERENCES 

1  G.  Naray-Szabo,  The  harmony  of  molecules,  in:  J.  '  Maruani 
(Ed.),  Molecules  in  Physics,  Chemistry  and  Biology.  General 
Introduction  to  Molecular  Sciences,  Vol.  1,  Kluwer,  New 
York,  1988,  pp.  205-231. 

2  S.C.  Basak,  V.R.  Magnuson,  G.J.  Niemi,  R.R  Regal, 
Determining  structural  similarity  of  chemicals  using  graph- 
theoretic  indices.  Discrete  Applied  Mathematics,  19,  (1988) 
17-44,  in:  J.W  Kennedy  and  L.C  Quintas  (Eds.),  Applications 
of  Graphs  in  Chemistry  and  Physics,  North-Holland, 
Amsterdam,  1980. 

3  H.J.  Luinge,  G.J.  Kleywegt,  H.A.  Van't  Klooster  and  J.H.  van 

der  Maas,  Artificial  intelligence  used  for  the 
interpretation  of  combined  spectral  data.  3.  Automated 
generation  of  interpretation  rules  for  infrared  spectral 
data,  J.  Chem.  Inf.  Comput.  Sci.,  27  (1987)  95-99. 

4  Y.  Sun,  L.  Pierrons  and  M.-C.  Baton,  Resolution  de  probldme 

et  raisonnement  expert  en  synthase  de  molecules  organiques: 
le  systeme  OASIS,  Journees  Internationales  Systemes  Experts 
et  Applications,  Avignon  (1988)  pp.  483-495. 

5  K.  Takeuchi,  C.  Kuroda  and  M.  Ishida,  Prolog  program  for 
subgraph  enumeration  and  calulation  of  molecular 
connectivity  indexes,  J.  Comput.  Chem,  10,  (1989)  380-385. 

6  G.  Klopman,  C.  Raychaudhury  and  R.V.  Henderson,  A  new 
approach  to  structure-activity  using  distance  information 
content  of  graph  vertices,  Mathl  Comput.  Modelling,  11 
(1988)  635-640. 

7  M.  Randic,  S.C.  Grossman,  B.  Jerman-Blazic,  D.H.  Rouvray  and 

S.  El-Basil,  Modelling  Drug  Desingn  II,  Mathl  Comput. 
Modelling,  11  (1988)  837-842. 


747 


U1*^.!^-1^  /^s9<gs;,>?  ’*-  ^Sf  '«*«*«-"w ^»4«-  >  ^j^*^"*'*^®***#®***!*^**"*^^  ^r y*.s* 


Modelling  of  Molecular  Structures  and  Properties.  Proceedings  of  an  International  Meeting, 
Nancy,  France,  11-15  September  1989,  J.-L.  Rivail  (Ed.) 

Studies  in  Physical  and  Theoretical  Chemistry,  Volume  71,  pages  747-753 
©  1990  Elsevier  Science  Publishers  B.V.,  Amsterdam  —  Printed  in  The  Netherlands 


HANDLING  THE  STRUCTURAL  INFORMATION  RESULTING  FROM 
MOLECULAR  MODELLING  FOR  DRUG  ACTIVITY  PREDICTIONS:  THE 
SARAH  SYSTEM 


Roger  ROZOT,  Jean-l.ouis  RIVAIL  and  Herve  MATHIS 

Laboratoire  de  Chimie  Theorique,  UA  CNRS  nbr  510,  Universite  de  Nancy  I,  Domaine 
Scientifique  Victor  Grignard,  BP  239,  54506  Vandoeuvre  les  Nancy  Cedex  (France). 


SUMMARY 

SARAH,  a  new  program  which  combines  artificial  intelligence  (pattern  recognition  and  machine 
learning)  and  statistics  has  been  elaborated  in  order  to  assist  the  chemist  in  molecular  modelling 
and  the  search  for  structure-activity  relationships.  It  extracts  the  relevent  information  from  data 
of  a  training  set  to  create  both  qualitative  and  quantitative  rules  which  appear  to  govern  the 
activity  of  parent  molecules, 

INTRODUCTION 

The  SARAH  software,  for  "Structure  -  Activity:  Relationships  by  Apprenticeship  and  Heuris¬ 
tics"  [1,2]  is  presented.  Its  purpose  is  to  assist  the  chemist  in  devising  molecules  which  have 
properties  interesting  in  pharmacology.  After  a  molecular  modelling  step,  one  has  three  different 
levels  of  structural  information  on  a  molecule: 

(i)  the  nature  and  the  connectivity  of  the  atoms,  which  comes  from  the  input  data, 

(ii)  the  geometry  of  one  or  several  conformations, 

(iii)  the  electronic  properties,  especially  if  the  modelling  is  based  upon  a  quantum  chemical 
computation. 

This  information  is  very  rich  and  is  assumed  to  contain  most  of  the  data  which  are  liable  to 
play  a  role  in  the  biological  activity.  The  problems  which  we  have  to  solve  are: 

(i)  How  to  extract  the  relevent  information. 

(ii)  How  to  correlate  it  to  the  activity. 

The  SARAH  approach  assumes  that  there  exists  a  training  set  of  molecules  the  activity  of 
which  has  been  measured  for  the  biological  process  of  interest.  Furthermore,  one  assumes  that  for 
each  molecule  one  knows  the  geometry  of  the  most  stable  conformation(s)  as  well  as  the  electronic 
wavefunction  usually  obtained  by  a  semi-empirical  quantum  chemical  package  such  as  GEOMO 
[3].  These  data  are  used  to  initiate  a  learning  process  which  should  discriminate  between  active  or 
inactive  molecules  on  a  basis  of  purely  structural  characteristics.  The  quantitative  relationships 
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between  the  score  of  the  activity  test  and  the  structural  descriptors  is  established,  within  each 
class  by  means  of  a  classical  statistical  method.  This  initialization  procedure  being  achieved,  it 
becomes  possible  to  submit  a  new  molecular  structure  and  the  system  locates  it  in  one  of  the 
classe  and  gives  its  position  relative  to  the  other  molecules  of  the  class. 

THE  LEARNING  PROCESS 

This  step  is  a  generalization  of  the  method  used  by  a  chemist  who  wants  to  compare  different 
molecular  structures  in  a  series  of  pharmacologically  related  molecules. 

It  starts  by  finding  a  sub-structure,  or  reference  pattern,  which  is  common  to  all  the  molecules 
of  the  series  in  order  to  allow  an  unambiguous  superimposition  of  the  molecular  geometries.  This 
can  be  done  automatically  by  two  dimensional  pattern  recognition  if  the  reference  pattern  has  the 
same  connectivity  in  all  the  molecules  of  the  training  set.  When  the  reference  pattern  is  defined 
by  a  limited  number  of  atoms  (usually  heteroatoms)  which  occupy  the  same  relative  positions 
in  space  in  all  the  active  conformations  but  have  a  different  connectivity,  the  previous  procedure 
fails.  In  this  case  the  reference  pattern  may  be  defined  by  the  user  through  an  interactive  graphic 
routine  which  displays  the  superimposition  of  two  molecules. 

Then  the  user  has  to  designate  a  reference  molecule  (usually  in  the  active  subset)  and  all 
the  molecules  of  the  training  set  are  compared  to  it  by  scanning  the  space  around  origin  chosen 
on  the  reference  pattern.  Each  direction  has  two  intersections  with  the  van  der  Waals  molecular 
surface  where  several  structural  descriptors  can  be  defined:  distance  to  the  origin,  electrostatic 
potential,  electric  field  .... 

The  similarities  and  the  differences  between  a  molecule  of  the  training  set  and  the  reference 
molecule  are  pointed  out.  This  makes  possible  a  machine  learning  of  the  activity  concept  by 
search  for  the  structural  descriptors  which  discriminate  the  two  classes  of  molecules  that  is  to  say 
by  determination  of  the  activity  rules  [4-10]. 

Finally,  in  each  subset,  a  multivariate  regression  between  the  value  of  the  activity  score  and  a 
series  of  other  structural  descriptors  which  may  play  a  role  in  modifying  the  activity  is  established. 

The  flow  chart  of  the  learning  process  is  given  on  figure  i. 

SUBMITING  A  NEW  STRUCTURE 

The  activity  rules  are  usually  very  useful  to  devise  an  improved  molecular  structure.  After  the 
molecular  modelling  process  it  is  submitted  to  the  activity  rules  and  an  expected  activity  score  is 
given  (see  figure  2). 

When,  in  the  new  molecule,  a  molecular  descriptor  lies  outside  the  range  of  variation  observed 
on  the  training  set,  the  system  automatically  indicates  that  its  predictions  may  be  wrong,  due  to 
an  effect  which  has  not  been  taken  into  account  in  the  learning  process.  This  is  particularity  true 
in  the  case  of  steric  effects  for  which  the  biological  response  does  not  vary  monotonously  with 
the  molecular  size. 
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Figure  1:  The  learning  step  of  the  SARAH  system 
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Figure  2:  Submitting  a  new  structure 
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UPDATING  THE  SYSTEM 

When  a  structure  which  does  not  belong  to  the  training  set  appears  to  have  its  activity 
quantitatively  determined,  it  is  possible  to  add  it  to  the  training  set  and  to  update  both  the 
activity  rules  and  the  QSAR. 
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EXAMPLE  OF  APPLICATION 

The  first  attempt  to  use  the  SARAH  system  was  for  the  antiepileptic  family  of  1,4-benzodia¬ 
zepines.  The  training  set  contains  49  molecules,  34  in  the  active  subset  and  15  in  the  inactive 
one.  Diazepam  (or  7  chloro  -  1  methyl  -  5  phenyl  -  3  H  -  1,4  benzodiazepin  2  -  one)  has  been 
chosen  as  reference  molecule  in  the  active  set. 

The  learning  step  automatically  generates  a  very  simple  activity  rule  only  based  upon  geometric 
data  related  to  the  van  der  Waals  surface  of  the  molecules: 

If  the  length  of  the  substituani  of  the  carbon  7  (D-t)  is  between  3.20  A  and.  5.28  A  and  if  the 
length  of  the  substituani  of  carbon  5  ( D 5)  is  smaller  than  6.58  A  then  the  molecule  is  active. 

For  these  molecules,  the  activity  is  lineary  correlated  to  four  electronic  descriptors  which  are 
the  electrostatic  potentials  in  the  vicinity  of  each  nitrogen  atom  of  the  diazepine  ring  and  in  the 
vicinity  of  atom  3’  of  the  phenyl  ring  and  finally  the  electric  field  in  the  vicinity  of  atom  8  on  the 
benzo  ring  (see  figure  3). 


(g)z 


Figure  3:  Result  of  the  learning  step: 

(3.20  A  <  Di  <  5.28  A)  and  ( D 5  <  6.85  A)  =*•  active  molecule 

These  descriptors  are  easy  to  interpret.  The  first  two  are  clearly  related  to  the  basicity  of  the 
nitrogen  atoms  (or  to  their  ability  to  create  a  H  bond).  The  last  one  is  clearly  connected  to  the 
polarity  of  the  substituant  in  7.  The  third  one  may  be  a  measure  of  the  electronic  influence  of 
the  substitution  of  the  phenyl  ring  in  5. 
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The  learning  step  has  been  followed  by  the  analysis  of  a  test  set  of  12  new  molecules.  10  of 
them  are  classified  in  the  appropriate  subset  and  the  estimated  activity  is  in  fair  agreement  with 
experiment.  The  last  two  molecules,  which  are  considered  as  inactive  and  are  classified  erroneously 
are  the  only  ones  which  contain  a  large  substituant  (ethyl  and  phenyl)  on  carbon  3  althrough  the 
largest  substituant  of  the  molecules  of  the  training  set  is  a  methyl  group.  As  indicated  previously, 
this  particularity  is  mentioned  by  the  system.  This  is  an  indication  of  another  steric  effect  which 
would  probably  produce  a  complementary  activity  rule  but  this  assumption  has  not  been  tested 
yet. 

CONCLUSION 

The  SARAH  system  represents  a  progress  in  the  search  for  correlations  between  structure  and 
activity. 

By  separating  the  analysis  into  two  steps:  the  search  for  general  activity  rules  and  the  quanti¬ 
tative  correlation  of  this  activity  with  structural  data,  it  reproduces  rather  closely  the  intellectual 
process  of  the  chemist.  This  procedure  takes  into  account  the  differences  between  steric  and 
electronic  effects  in  biological  activity. 

It  seems  plausible  that  within  a  reasonable  range  this  activity  varies  monotonously  with  the 

\ 

electronic  effects  so  that  the  corresponding  descriptors  are  well  adapted  to  QSAR.  Conversely, 
steric  effects  often  imply  that  along  some  directions  the  molecule  must  be  neither  too  long  nor 
too  short.  This  feature  can  easily  be  expressed  by  activity  rules. 

This  difference  clearly  appears  in  the  1,4-benzodiazepine  family  in  which  the  rule  is  purely 
steric  and  the  factors  of  QSAR  are  all  electronic. 

This  example  illustrates  rather  well  the  superiority  of  the  SARAH  approach  on  the  purely 
statistical  methods  [1,11]. 
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ABSTRACT: 

For  a  few  years  now,  molecular  shape  has  emerged  as  one  of  the 
main  parameters  in  simulation  strategies.  This  is  a  general 
agreement  that  3D  geometry  is  of  prime  importance  for  molecular 
recognition ,  and  many  structure-property  relationships .  Despite 
these  efforts,  automatic  identification  of  tridimentionnal 
structural  fragments  is  still  an  open  problem.  Most  of  the  time, 
one  cannot  compare  shape  similarities  in  a  quantitative  manner. 

As  a  new  description  tool,  we  propose  the  OCTANG  method.  This 
system  deals  with  topological  information  (graph),  where  3D 
information  lies  in  valuated  edges.  The  OCTANG  system  has  already 
succeeded  in  the  identification  of  a  predefined  molecular 
fragment ,  among  a  set  of  compounds .  It  should  allow  us  for 
developing  new  algorithms  for  structural  database  management, 
pharmacophore  identification,  and  maximal  common  substructure 
searching. 


INTRODUCTION 

Infography  has  emerged  as  an  effective  tool  in  chemistry.  One  can 
now  describe  and  simulate  some  of  the  quantum  mechanical 
concepts,  such  as  electron  densities  or  electrostatic  potential 
(1).  These  electronic  properties  are  essential  in  "molecular 
recognition"  mechanisms,  where  the  complementarity  of  the 
surfaces  may  optimize  the  interactions. 
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The  technological  supplies  have  revealed  how  important 
tridimentional  geometry  and  shape  are  for  molecular  description. 
It  also  emphasized  one  of  the  main  problems  :  automatic 
recognition  of  3D  molecular  objects. 

This  is  an  important  problem  because  knowledge  in  chemistry  has 
to  be  widely  distributed,  for  general  use  such  as  Drug  Design, 
Spectroscopy,  Statistics,  etc.  For  this  purpose,  one  often  needs 
to  get  a  global  information,  from  structural  and  sometimes 
substructural  information.  So  the  representation  of  this 
structure  should  be  treated  as  one  of  the  main  parameter,  for  the 
efficiency  of  structure  property  relationships  and  databases 
management  systems. 

However,  the  actual  nomenclature  does  not  involve  3D  geometry. 
This  may  be  the  reason  why  one  is  not  accustomed  to  it,  and  it 
could  explain  why  molecular  geometry  is  only  described  in  a 
unique  manner,  that  is  cartesian  coordinates. 

Consequently,  we  propose  a  new  description  method,  called  the 
OCTANG  method.  The  molecular  descriptor  that  we  create,  looks 
like  a  topological  one,  but  the  edges  of  the  chemical  graph  are 
3D  significant.  This  allows  for  an  easy  and  fast  handling  of 
molecular  information,  and  we  suggest  in  this  paper  a  set  of 
applications.  Before  presenting  OCTANG  method,  we  are  going  to 
summarise  briefly  the  current  concepts  which  are  used  in 
automatic  structural  recognition. 


Molecular  recognition  and  similarity: 

Tridimentional  structural  or  substructural  recognition  takes 
place  in  information  retrieval  from  databases,  but  also  in 
structure-property  relationships.  In  both  fields,  the  common 
concept  remains  similarity  searching,  according  to  the  key-lock 
concept.  In  fact,  we  always  try  to  explain  an  experimental  event, 
with  structural  characteristics  : 


EXPERIMENTAL 

EVENT 


STRUCTURAL 

■SIMILARITY¬ 

SEARCHING 


-  3D  GEOMETRY, SHAPES 

-  TOPOLOGY  (indices) 

( DARC . . . ) 

-  PHYSI .  CHEMISTRY 

Spectro.  Quant  Mech. 

—  STATISTICS 


Similarity  searching  often  involves  topological  indices.  They  are 
easily  calculated,  and  closely  related  to  the  chemical  graph  (2). 
Basically,  they  are  based  on  node  and  pathway  count.  One  of  the 
first  was  the  WIENER  parameter,  designed  to  predict  boiling 
points.  (3) 

Randic,  one  of  the  main  workers  in  this  area,  proposed  a  first 
topological  indice  (4),  then  a  second  one,  called  "molecular 
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identification  number"  (5).  The  latter  was  supposed  to  give  a 
unique  description  of  molecular  graph,  but  a  recent  systematic 
calculation  for  alcane  series  up  to  20  carbons  atoms,  revealed  a 
hundred  of  non  isomorphic  pairs  with  identical  indices.  (6) 
Anyway,  the  concept  of  pathway  count  is  now  well  known  as 
molecular  connectivity,  and  remains  one  of  the  most  widely  used 
methods  in  correlation  anaLysis.  (7) 

The  most  critical  problem  arises  when  two  different  graphs  have 
the  same  indice  values.  Obviously,  topological  indices  are  not 
unique,  and  the  best  way  to  use  them  in  databases  remains  to 
design  screen  strategies.  Database  management  with  2D  descriptors 
is  a  reality  since  ten  years  now,  with  DARC  system  or  CAS  system 
(8,9). 

Another  way  for  similarity  searching,  is  using  3D  molecular 
geometry,  that  is  atomic  cartesian  coordinates.  In  this  field,  we 
have  to  separate  Maximal  Common  Substructure  searching,  and 
Predefined  Common  Substructure  searching. 

The  first  one,  MCS ,  consists  in  identifying  the  largest  common 
structural  feature  in  a  set  of  compounds.  So,  combinatorial 
problems  arise,  when  generating  intermediate  solutions  during  the 
so-called  growing  steps.  In  the  second  case,  a  molecular  fragment 
has  to  be  searched  among  largest  structures.  Here  are  some  of  the 
current  methods: 

Crandeli  et  Smith  (10)  Uilman  (14) 

Clique  detection  (11,12)  Set  reduction  (14) 

Lesk  (13)  Stouch/Jurs  (15) 


All  these  methods  use  interatomic  distances,  represented  in  a 
distance  matrix,  (excepted  St.ouch/Jurs’  one).  Consequently,  the 
common  purpose  for  these  algorithms  is  to  find  an  isomorphism  or 
a  subgraph  in  a  larger  graph.  Each  of  them  generally  needs  a 
dissuasive  CPU  time,  so  these  approaches  cannot  be  easily 
performed . 

The  Stouch  method  is  based  on  a  3D  grid,  to  define  and  compare 
molecular  shapes  in  a  discrete  manner. 

Finally,  two  distinct  strategies  are  emerging: 

-  Structural  databases  management  which  needs  rapidity  .  The  only 
way  seems  to  be  bit  screens  methods,  associated  with  more 
precise  methods  operating  on  relevant  database  subsets. 

-  Structural  or  electronic  similarity  searching  which  proceeds  in 
a  quite  different  manner.  3D  Grid  methods  and  shape 
superposition  are  used  now,  but  a  new  kind  of  similarity 
indexes,  based  on  electronic  densities  have  been  proposed. 

We  attempted  here  to  design  a  new  molecular  descriptor,  which  is 
basically  a  topological  one,  including  an  edge  chromatism  to 
describe  tridimentional  relative  positions.  This  is  the  OCTANG 
method . 
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THE  OCTANG  METHOD: 

The  description  of  a  set  of  points  may  be  performed  by  their 
relative  positions,  instead  of  cartesian  coordinates.  It  is  a 
local  perception  of  a  point  environment,  which  is  easilv  extended 
in  a  3D  space.  This  space  is  then  subdivised  into"  8  zones 
according  to  X,Y  and  Z  axes: 


Such  a  system  can  be  moved  on  each  atomic  position  m  a  skeleton, 
to  describe  its  neighbours  without  any  dimensionnal  parameter.  We 
chose  to  describe  only  the  first  topological  neighbours  for  each 
atom,  and  to  keep  a  complete  connectivity  array.  This  point  is 
important  to  be  noted,  because  the  direction  code  depends  on  the 
view  point. 

With  these  conventions,  we  can  fully  describe  a  molecule  with  two 
arrays  (connectivity  and  direction),  and  a  chromatic  list: 


Structural  identi f ication : 


The  first  problem  arising  when  triyng  to  compare  two  molecules, 
is  their  relative  positions.  With  our  codification  system,  a 
unique  fragment  is  perceived  in  two  different  ways,  if  it  lies  in 

]  two  different  positions. 

•  So  it  is  clear  that  molecular  descriptors  cannot  be  strictly 

compared.  For  the  recognition  step,  we  developed  a  pathway 
representation  associated  with  the  "elementary  fragment"  concept. 
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An  elementary  fragment  is  a  triple  of  nodes  from  the  graph.  It  is 
the  smallest  object  characterised  by  an  internal  geometry,  and  it 
is  especially  the  basic  descriptor  which  will  allow  for  the 
pathway  recognition. 

We  have  also  developed  the  equivalent  elementary  fragment 
concept,  to  gather  in  a  same  group  all  the  elementary  fragments 
with  a  same  internal  geometry.  Thus,  we  can  recognise  any 
fragment,  whatever  its  global  orientation. 

The  pathway  decomposition  is  led  by  a  small  set  of  rules,  and 
finally  any  graph  is  perceived  as  a  main  pathway,  with  a  set  of 
ramifications . 

The  main  pathway  is  the  topologically  longest  one,  identified  by 
a  deep-first  search  algorithm,  from  high  connectivity  nodes.  The 
ramifications  may  be  designed  in  a  "recursive"  manner.  We  mean 
that  any  ramification  may  be  attached  either  on  the  main  pathway 
or  on  anher  ramification.  Each  pathway  or  ramification  is  then 
totally  described  by: 

Length 

Chromatic  node  list 
Cyclisation  node  (ring  closure) 

Parent  pathway 

Anchoring  node  on  the  parent  phatway 
Contact  node  (closure  on  the  parent  pathway) 

It  must  be  kept  in  mind  that  the  pathway  representation  exists 
only  during  the  recognition  step.  In  practice,  a  molecule  is 
stored  as  a  triple  :  chromatic  list,  connectivity  array, 
direction  array. 

The  pathway  decomposition  and  the  elementary  fragment  concept 
finally  lead  to  a  complete  description  system.  We  present  now,  a 
schematic  view  of  the  recognition  step. 

A  3D  request  is  presented  as  a  set  of  related  pathways,  and  the 
most  straightforward  strategy  is  to  sequentially  search  a  set  of 
equivalent  pathways  in  a  larger  structure. 

This  is  performed  in  an  iterative  way,  where  each  pathway  is 
progressively  recognized  by  its  elementary  fragments.  Here,  the 
equivalent  fragment  concept  is  of  prime  importance  to  decide 
wether  two  triples  are  congruent  or  not.  During  the  step  of 
progression  for  each  pathway,  we  have  chosen  a  partial 
overlapping  to  keep  consistency  along  the  whole  pathway. 

Finally,  a  pathway  is  accepted  only  if  all  its  parameters  match 
the  reference  parameters  (ring  closure,  contact  node...). 

In  practice,  these  parameters  are  used  to  direct  search  procedure 
towards  the  best  ways.  Before  we  conclude,  we  can  summarise  the 
recognition  strategy  as  follows: 

-  Given  a  molecule  described  by  chromatic  list,  connectivity 
array  and  direction  array. 

-  Given  a  requested  structural  fragment,  described  by  its 
pathways . 
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Does  the  fragment  lie  in  the  molecule  ? 

The  general  procedure  to  answer  this  question  is  the 
fol lowing : 


-  Select  in  the  molecule,  all  the  nodes  with  the  same 
chromatism  and  connectivity  as  the  requested  main  pathway 
first  node. 

-  With  each  of  these  nodes,  search  for  a  pathway  similar  to 
the  main  pathwa.v  requested 

-  For  each  first  node  which  has  succeeded  in  generating  a 
main  pathway,  apply  the  same  procedure  to  find  out  the 
rami f ications . 

This  step  is  quickly  achieved  since  we  know  the  row  of 
anchoring  nodes  in  the  main  pathway. 

This  strategy  has  been  successfully  applied  in  3D  structural 
recognition  in  many  different  cases.  But  we  have  to  mention  here 
some  distinctive  features: 


-  One  can  obtain  many  solutions  separated  only  by  the  utilisation 
of  the  nodes  (pathway  or  ramification).  In  fact  these  solutions 
have  to  be  filtered. 


-  Another  redundancy  may  arise  because  many  nodes  can  be  used  as 
first  node  of  a  pathway.  (1-2-3  ,  3-2-1) 


-  Partial  solution  is  possible.  Our  actual  algorithm  accepts  a 
solution  if  the  main  pathway  is  shorter  than  the  requested  one 
but  takes  care  of  the  anchoring  nodes. 

In  practice,  we  avoid  redundancy  by  filtering  multiple  solutions 
.just  before  storing  them. 


CONCLUDING  REMARKS: 

The  first  application  of  OCTANG  method  was  the  recognition  of  a 
predefined  3D  structural  fragment.  The  system  has  proved  itself 
efficient.  We  are  now  working  on  the  MCS  problem,  and  we  are 
aiming  at  a  breadth-first  search  procedure  using  elementary 
fragments.  This  part  of  the  system  should  allow  for  the 
determination  of  a  structural  pharmacophore. 

We  are  also  studying  the  stereochemistry  codification.  The 
equivalent  fragments  concept  is  not  well  suited  for  dealing 
correctly  with  stereochemistry.  So  we  plan  to  include  this 
knowledge  in  a  screen  system,  designed  as  a  node  and  its  first 
order  topological  environment  for  selection  in  a  database. 

Last  but  not  least,  we  are  currently  working  on  a  database 
structure  using  our  representation  mode  (chromatism  list, 
connectivity  and  direction  arrays).  In  such  a  structure  it  would 
be  easy  to  design  efficient  screens  and  to  manage  structural  and 
substructural  requests. 
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MOLECULAR  GRAPHICS  FOR  THE  MACINTOSH 

J.M.  CENSE 

Laboratoire  d'informatique  chimique,  Ecole  Nationale  Superieure  de  Chimie  de 
Paris,  1 1,  rue  Pierre  et  Marie  Curie  -  75231  Paris  Cedex  05  -  FRANCE 


SUMMARY 

Two  programs,  MolDraw  and  MoIView,  implemented  on  the  Apple  Macintosh, 
are  described.  These  programs  give  access  to  good  quality  3-D  pictures  of  atomic  and 
molecular  objects  whose  coordinates  have  been  read  in  from  files  formatted 
according  to  several  standards.  Depth-cueing,  stereo  views  and  animation  are  used  to 
enhance  perception  of  depth.  High  quality  hard  copies  of  all  pictures  are  readily 
obtained  by  using  laser  printers. 


INTRODUCTION 

Now  that  Apple  Macintosh  and  high  resolution  laser  printers  are  widely  available 
it  appears  interesting  to  have  an  easy  access  to  specifically  tailored  packages  allowing 
creation  and  manipulation  of  chemical  drawings.  Using  these  programs,  most  of  the 
visualization  job,  classically  performed  on  expensive  workstations,  can  be 
accomplished  on  a  cheap  micro-computer.  Furthermore  the  drawings  so  obtained 
may  be  directly  printed  on  a  wide  variety  of  laser  printers  or  put  on  the  Macintosh 
clipboard  to  be  later  processed  by  word  processors,  desktop  publishers,  presentation 
managers  or  graphics  software. 


HARDWARE  AND  SOFTWARE 

MolDraw  runs  on  any  Macintosh  with  1  Mb  of  memory  and  is  mainly  black  and 
white  oriented.  It  accepts  structures  up  to  512  atoms  or  600  bonds.  MoIView  is 
written  to  run  only  on  the  Macintosh  II  with  a  minimum  of  2  Mb  of  memory  and  256 
colors  (or  at  least  256  gray  tones).  It  handles  structures  up  to  800G  atoms  or  bonds. 

With  a  palette  of  256  colors,  objects  can  be  displayed  in  8  colors  with  32  levels  of 
shading,  this  is  sufficient  to  give  an  excellent  rendering  of  shaded  spheres.  The 
visible  region  of  spheres  is  determined  quickly  by  using  regions  manipulation 
techniques  (ref.  1),  shading  is  performed  by  stamping  computed  templates  (one  for 
each  atom  type)  into  the  calculated  visible  region. 

Due  to  the  fast  bit-transfer  tools  implemented  in  the  Macintosh  Toolbox,  a  color 
space-filling' picture  of  a  500  atom  molecule  can  be  obtained  in  less  than  15  seconds  at 
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the  screen  resolution  of  the  Macintosh  II  (640  x  480,  72  pixels  per  inch).  Most 
models  (<50  atoms)  are  displayed  in  less  than  3  seconds. 

Both  programs  are  written  in  Lightspeed  C  and  follow  the  general  Macintosh 
interface  with  windows,  scrolling  menus,  dialogs,  keyboard  equivalents  and  desk 
accessories. 

Several  printers  have  been  tested  :  Apple  Image  Writer  II  and  LaserWriter,  QMS 
ColorScript  100  and  Tektronix  4693  DX  color  laser  printers.  Full  page  black  and 
white  images  at  maximum  resolution  (300  ppi)  are  printed  in  less  than  3  minutes. 


3-D  STRUCTURE  INPUT 

MolDraw  and  MolView  programs  read  text  files  of  atomic  coordinates  according 
to  several  formats  :  SYBYL  (ref.  2),  PDB  (ref.  3)  or  CHEM3-D  (ref.  4).  These  text 
files  are  transferred  from  host  computer  or  created  on  the  Macintosh  using 
molecular  builder  such  as  CHEM3-D  (ref.  4)  or  PCMODEL  (ref.  5). 

A  private  format  for  Cartesian,  internal  or  fractional  coordinates  is  also  provided. 
Internal  coordinates  files  permit  rapid  construction  of  small  molecules  whereas 
fractional  coordinates  files  are  used  to  collect  crystallographic  information. 

These  crystallographic  files  include  cell  constants,  fractional  coordinates, 
symmetry  operators  and  repetition  operators  to  generate  several  contiguous  cells. 
Therefore,  processed  structures  are  not  restricted  to  molecular  structures. 


3-D  STRUCTURE  MANIPULATION 

MolDraw  and  MolView  programs  allow  the  production  of  a  wide  variety  of 
display  modes  :  ball  and  stick,  space  filling,  wire-frame,  ribbon  drawing  of  protein 
and  DNA  molecules  (ref.  6),  dot  surfaces.  Stereo  pairs  of  several  models  are 
available  with  MolDraw. 

Parameters,  such  as  covalent  radii,  atomic  radii,  atomic  colors,  atomic  patterns 
and  gray  tones  are  adjustable. 

Classical  functionalities  such  as  X,  Y  and  Z  rotations,  translations,  scaling, 
rotations  around  a  non-ring  bond  are  provided.  An  axis  defined  by  two  atoms  can  be 
moved  to  X,  Y  or  Z  axis.  A  plane  defined  by  three  atoms  can  be  moved  to  XY,  YZ  or 
ZX  plane.  A  molecule  read  from  a  crystallographic  file  can  be  projected  onto  a  'hkl' 
plane.  Several  molecules  can  be  simply  compared  after  alignment  along  chosen  axis 
and  planes. 

Information  about  interatomic  distances,  angles  or  torsions  are  available  by 
selecting  pertinent  atoms. 

No  energy  minimization  of  structure  is  presently  available.  Commercial  software 
implemented  on  the  Macintosh  II,  such  as  CHEM3D  (ref.  4)  or  PCMODEL  (ref.  5) 
must  be  used  in  order  to  obtain  this  functionality. 
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DEPTH  PERCEPTION 

Several  techniques  have  been  tested  for  enhancing  depth  perception  :  depth- 
cueing,  stereopsis  and  animation. 

Depth-cueing 

Depth-cueing  is  easily  implemented  on  a  256  colors  monitor  with  wire-frame, 
ribbon  or  dot  surfaces  models.  On  a  black  and  white  monitor  depth-cueing  requires 
the  use  of  a  simplification  analogous  to  the  Shademol  algorithm  (ref.  7).  However, 
with  space-filling  or  ball  and  stick  models,  hidden-surface  elimination  makes  depth- 
cueing  unnecessary. 

Stereopsis 

Stereo  views  are  projected  in  either  a  relaxed  mode,  a  mirror  mode  (one  view  is 
seen  after  reflection  on  a  mirror)  or  a  crossed  mode.  The  stereo  separation  and  the 
rotation  between  stereo  views  are  adjustable.  Best  results  are  obtained  after  printing 
on  a  laser  printer  at  maximum  resolution. 


Fig.  1.  Stereo  pair  of  Spironolactone  crystalized  from  CH3CN  (ref.  8).  Details, 
including  two  CH3CN  molecules  per  cell  are  clearly  visible. 

Animation 

Animation  is  a  very  effective  technique  for  displaying  depth  relationships. 
Successive  views  of  a  3-D  model,  projected  after  rotation  around  any  axis,  are  stored 
in  memory  and  later  transferred  to  the  screen. 

As  it  is  impossible  to  infer  the  sense  of  rotation  from  orthonormal  projections  of 
wire-frame  models,  animation  of  these  models  is  very  uncomfortable.  Using  depth- 
cued  wire-frame,  ball  and  stick  or  space  filling  models,  the  sense  of  rotation  can 
easily  be  deduced. 

The  number  of  animated  images  and  the  animation  speed  depend  on  the  size  of  the 
picture  and  on  the  size  of  available  memory.  About  15  color  pictures  covering  50% 
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of  the  screen  (Macintosh  II)  can  be  transferred  every  second  between  memory  and 
screen.  Each  picture  uses  150  Kb  of  memory  so  that  30  pictures  are  stored  prior  to 
animation.  For  a  black  and  white  picture  of  the  same  size,  transfer  rate  is  8  times 
fa«ter  and  memory  usage  8  times  smaller. 


PICTURE  AND  ANIMATION  SAVING  AND  RECOVERING 
Pictures  and  animations  can  be  saved  on  disk,  preferably  a  hard  disk,  to  be 
replayed  later.  A  typical  color  screen  occupies  150  Kb  of  disk  storage,  a  typical  color 
animation  uses,  after  packing,  about  1.5  Mb  of  disk  storage.  Several  screens  and 
several  animations  can  be  recovered  sequentially.  Commercial  movie  makers  can 
also  be  used  to  animate  screens  pasted  onto  the  clipboard. 


CONCLUSION 

MolDraw  and  MolView  programs  were  written  to  allow  cheap  access  to  many 
functionalities  classically  provided  by  more  expensive  workstations.With  these 
programs,  workstations  are  relieved  of  time-consuming  picture  processing  and  hard 
copies  are  usually  more  easily  available  than  with  workstations. 

Contact  the  author  for  information  about  the  availability  of  theses  programs. 
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PRESENTATION  OF  GLOS 
G.A.  LANGLET 

Commissariat  a  I'Energie  Atomique  IRDI-DESICP-DLPC-Service  Chimie 
Moleculaire,  BP  1 21 , 91 1 91  GIF-SUR-YVETTE  CEDEX  (France) 

Telex  :  604641  ENERG.  SPP+  ;  Fax  :  (33)  1  69  08  79  63. 

GLOS  is  an  all-purpose  software  integrator  that  allows,  in  a  simple 
way,  to  call  numerous  utilities,  inter  alia  editors,  and  programs  that  may 
be  written  in  various  languages  -  among  them  :  APL,  Fortran,  C,  Pascal  and 
assembler.  It  runs  on  PC/AT-or  80386  compatible  micro-computers,  and 
uses  less  memory  than  conventional  integrators.  All  functionalities  are 
accessed  within  pop-up  menus,  either  with  the  keypad  or  with  a  mouse 
(e.g.  IBM,  Logitech,  Microsoft).  On-line  documentation  is  provided  ;  this 
latter  can  be  modified  by  the  user  when  he  integrates  his  own 
functionalities.  GLOS  requires  APL*PLUS*PC  (by  STSC  Inc.,  version  6  or  a 
later  version). 

GLOS  is  ergonomic  and  easy  to  use,  especially  in  he  context  of  a 
research  laboratory  or  a  teaching  environment.  It  has  been  presented  at 
the  Nancy  meeting  in  association  with  BEMOL,  the  molecule  builder  and 
plotter  (see  the  following  description). 
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PRESENTATION  OF  BEMOL 
GA.LANGLET 
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\  121, 91191  GIF-SUR-YVETTE  CEDEX  (France). 

t  Telex  :  604641  F  ENERG.  SPP+  ;  Fax  :  (33)  1  69  08  79  63. 
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'  BEMOL  is  a  molecular  builder  and  plotter  which  has  been  designed  for  PC/AT-  or 

5  80386-compatible  micros  with  EGA  or  VGA  graphic  card.  It  makes  a  wide  use  of  automatic 

’  pop-up  menus  and  of  the  mouse.  BEMOL  requires  APL*PLUS*PC  version  6  or  up. 

Only  ASCII  native  files  are  used  as  input  and/or  output  so  as  to  transfer  easily  the  data  to 
larger  machines  for  an  eventual  more  complex  processing. 

Molecules  can  be  built,  with  automatic  recognition  of  the  data  type,  indifferently  from  : 

-  files  with  Cartesian  co-ordinates, 

-  files  with  oblique  co-ordinates  in  any  crystal  cell, 

-  files  containing  bond  lengths,  bond  and  torsion  angles,. 

The  format  of  data  remains  free  so  that,  in  the  future,  tables  extracted  from  books  or 
papers  -  e.g.  Acta  Cryst.  reprints  -  can  be  read  by  a  scanner.  Hydrogen  atoms  can  be 
j  automatically  generated.  No  connectivity  matrix  is  necessary  :  independent  molecules  and 

cycles  are  recognized.  Up  to  200  to  400  atoms  per  file  -  depending  on  the  simultaneous 
inclusion  of  various  other  simultaneous  facilities,  e.g.  the  natural-language  interpreter  or  the 
steric-hindrance  checker  -  are  handled. 

Commands  are  accepted  either  from  the  menus  or  from  a  line  that  lies  anywhere  on  the 
screen  -  and  even  outside,  since  the  user's  session  is  bufferized.  In  this  last  case,  commands 
can  be  written  in  natural  language  -  English  or  French  simultaneously.  The  command 
structure  has  been  in  fact  designed  in  order  to  accept  the  main  Indo-European  languages  of  the 
EEC.  Command  files,  with  natural-language  instructions,  can  also  be  executed.  On-line 
documentation  is  provided  for  the  natural-language  syntax. 

Pictures  are  in  16  colours,  with  several  stereoscopic  options,  including  anaglyphs. 

Automatic  framing  and  orientation,  hidden  parts,  conic  bonds,  labelling,  hardcopy  are 
supported. 

!  Students  and  scientists  learn  how  to  use  BEMOL  quickly  and  become  then  good  teachers  for 

their  colleagues. 

BEMOL  has  been  presented  at  the  Nancy  meeting  in  association  with  GLOS,  the  all-purpose 
,  software  integrator  (see  the  preceeding  description). 
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NETWORK  COMPUTING  OF  THE  EPSTEIN-NESBET  SECOND-ORDER  CORRECTION 
TO  THE  MOLECULAR  ELECTRONIC  DENSITY.  PRELIMINARY  RESULTS. 

M.  PICARD  and  J.M.  LECLERCQ 

Laboratoire  de  Dynamique  des  Interactions  Mol6culalres*,  University  Pierre  et  Marie 
Curie,  Campus  Jussieu,  tour  22, 75252  Paris  Cedex  05  (France) 

SUMMARY 

Parallel  processing,  in  the  environment  ot  an  ETHERNET  local  area  network  of 
UNIX-based  workstations,  of  the  Epstein-Nesbet  second-order  correction  to  the  density 
matrix  is  outlined.  The  speed-up  is  reported  for  preliminary  investigations  of  the  elec¬ 
tronic  ground  state  of  three  testing  compounds:  H2O,  H2CO  and  NH2N02-  These  first 
results  support  the  point  of  view  that  Network  Computing  is  an  efficient  and  low-cost  tool 
for  theoretical  investigations  of  molecular  properties. 

INTRODUCTION 

It  is  well  known  that  parallel  processing  has  been  introduced  because  improve¬ 
ments  in  electronic  circuit  speeds  alone  cannot  produce  the  performance  required  by 
many  problems  (refs.  1,2  ;  of  the  two  techniques  for  introducing  parallelism,  i.e. 
replication  and  pipelining,  we  emphasize  the  first  one  in  the  present  note).  The  CRAY  2 
and  IBM  3090/600  E(S)-VF  computers  are  typical  examples  of  multi-processor  main¬ 
frames  which  offer  parallelism  capability.  Such  an  approach  of  the  parallelism,  based 
on  the  limited  replication  of  very  fast  processors  is  very  expensive  but  well-adapted  to  a 
large  scale  of  problems  from  the  weather  forecast  to  military  applications...  The  use  of 
mass-produced  VLSI  circuits  in  massive  replication  architectures  is  an  alternative 
technique  which  often  requires  drastic  adaptations  of  the  usual  algorithms.  The  Network 
Computing  ,  i.e.  the  parallel  processing  in  the  environment  of  a  network  of  workstations 
and  servers,  is  an  intermediate  technique  with  limited  replications  of  mass-produced 
VLSI  circuits.  In  such  a  case,  the  optimization  of  the  "cost  /  performance"  ratio  (not  the 
power  at  any  cost !)  is  the  priority. 

Among  the  methodologies  of  Quantum  Chemistry,  the  perturbation  expansion  (refs. 
3,4)  is  particularly  adapted  to  parallel  processing.  However,  as  far  as  we  know,  the  first 
investigations  (refs.  5,6  and  references  therein)  are  limited  to  the  calculations  of  the 
electronic  correlation  energy.  We  focuse  ourselves  on  the  calculation,  beyond  the  SCF 
level,  of  the  molecular  electronic  density  (refs.  7-9)  and  the  present  note  reports  deve¬ 
lopments  of  the  Epstein-Nesbet  (refs.  10-12)  correction  to  the  matrix  density  elements. 


*UPR  A0271,  CNRS,  France; 

Unite  iiee  par  Convention  a  /’  Universite  Pierre  et  Marie  Curie,  Paris,  France. 
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THE  EPSTEIN-NESBET  SECOND-ORDER  CORRECTION  TO  THE  DENSITY  MATRIX. 
The  Moller-Plesset  partition  is  defined  as  (refs.  13,12) : 

H exact  =  H(0)+V=Z  i  F(i)  +  V  (1) 

where  F  is  the  closed-shell  restricted  Hartree-Fock  operator  and  £  j  stands  for  the 
summation  over  the  N  electrons  of  the  molecular  system,  while  the  Epstein-Nesbet 
partition  is: 

H exact  =  h(°,  +  v  =  SjF(i)  +  SK  <K|V|K>  [K><K|  +  V  (2) 

where  £  K  stands  for  the  summation  over  all  the  eigenfunctions  of  H*®  (and  H*0)): 

|K>=  — — ;  X  H)P  p  t  <Jk/(ri)  °k/(si)  °K2  ^s2>  ■  ••<fk/v(rN)  °k/v(sn)-^  (3) 

v/N!  P 

where  the  molecular  orbitals  (MO)  ^ ,  issued  from  the  SCF  calculation,  are  expanded 
on  atomic  orbitals  (AO)  as: 

^  Ki  ~  ^  p  ^p.K/'  Xp  W 

While  the  theoretical  investigation  of  the  electronic  density  at  the  SCF  level  is  very 
popular,  as  far  as  we  know,  the  corresponding  investigation  beyond  this  step  is  till  now 
limited  (refs.  14,15  and  references  therein).  In  the  framework  of  the  perturbation 
expansion,  the  second-order  correction  to  the  average  value,  for  the  electronic  ground 
state,  of  the  density  operator  p  Is  (on  the  assumption  of  real  basis  sets): 


(2) 
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where 

-  p  (i)  =  £  j  5(rrr)  ( 5(rrr)  is  the  usual  Dirac  function); 

-  'Fg0*,  'Fg  *  and  are  the  zeroth-order  wave  function  of  the  reference  state  and  its 
first-  and  second-order  corrections,  respectively. 

(D  (2) 

On  the  basis  of  the  classical  expressions  of  'Fg  and  'Fg  ,  the  Epstein-Nesbet 
second-order  correction  to  the  matrix  density  elements  may  be  easily  written  as: 
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CALCULATIONS  AND  RESULTS 

The  electronic  ab  initio  calculations  have  been  carried  out  using  the  MONSTER- 
GAUSS  program  (ref.  16).  The  SCF  process  within  the  closed-shell  restricted  Kartree- 
Fock  formalism  has  been  performed  using  a  6-31 G**,  a  6-31 G*  or  a  6-31 G  basis  set  for 
the  electronic  ground  state  of  H2O,  H2CO,  H2NN02,  respectively  (in  order  to  have  25  to 
40  basis  set  functions  at  this  step  of  preliminary  testing  calculations).  The  integral 
transformation  from  AO  to  MO  basis  has  involved  ail  the  MO  (no  frozen  occupied  or 
virtual  MO).  SCF  calculations  and  integral  transformations  run  on  the  SUN  3  /  260 
workstation  (see  Fig.  1). 


ETHERNET 


SCF  calculation 
and 

integral  transformation 


or 


parallel  processing 
of  the  Epstein-Nesbet 
second-order  correction 
to  the  molecular 
electronic  density 


Fig.  1 .  The  three  workstations  of  the  local  area  network  used  in  our  calculations. 
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The  unformatted  direct-access  output  file  fort.8  of  the  SCF  process  has  been  saved 
and  transferred  to  the  SPS  7  workstation  (s)  with  the  usual  remote  copy  order  of 
ETHERNET  protocol,  in  order  to  dispose  of  the  MO  vectors  (coefficients)  dp  «/  and  the 
overlap  integrals  on  the  AO  basis  (for  the  population  analysis).  Similarly,  the  unfor¬ 
matted  sequential-access  ouput  file  fortIO,  resulting  of  the  integral  transformation,  has 
been  saved  and  transferred  to  the  SPS  7  workstation  (s)  in  order  to  dispose  of  the 
transformed  one-  and  two-electron  integrals  (at  this  step,  it  may  be  pointed  out  the 
possibility  to  use  unformatted  files  from  a  workstation  to  another  one,  due  to  their  MC 
68XXX-based  character). 

The  Epstein-Nesbet  second-order  correction  to  the  matrix  density  has  been  calcula¬ 
ted  on  the  three  CPUs  of  the  multiprocessor  BULL  SPS  7  workstation  alone  or  on  this 
workstation  and  the  one-CPU  SPS  7  together  (technical  considerations  are  available 
on  request;  see  also  ref.  6). 

The  usual  parameter  that  is  used  to  evaluate  the  performance  of  parallel  calcula¬ 
tions  is  the  speed-up  Sp  =  T-j/Tp,  where  T-|  and  Tp  are  the  times  for  the  algorithm  to  run 
on  one  CPU  (sequential  calculation)  or  p  CPUs  (parallel  calculation),  respectively.  The 
values  of  this  parameter  are  reported  in  Table  1  for  three  testing  investigations. 

TABLE  1 

The  speed-up  Sp  for  some  preliminary  testing  calculations. 


Testing 

calculation 

speed-up  Sp 

3-CPU  SPS  7  alone  (p=3) 

3-CPU  plus  1-CPU  SPS  7  (p; 

H20/6-31G** 

2.89 

3.81 

H2CO  /  6-31 G* 

2.90 

3.83 

H2NN02/6-31G 

2.92 

3.87 

These  values  of  the  speed-up  show  how  the  present  Network  Computing  is  well- 
adapted  to  the  second-order  correction  to  matrix  density  elements  (the  theoretical  limits 
of  Sp  are  3  and  4  for  p  =3  and  4,  respectively).  On  another  hand,  the  excellent  "cost  / 
performance"  ratio  of  mass-produced  microprocessors,  which  leads  to  the  well-known 
"downsizing"  in  a  large  scale  of  applications,  ensures  the  low  cost  of  such  inves¬ 
tigations.  Consequently,  we  can  claim  that  these  first  results  support  the  point  of  view 
that  Network  Computing  is  an  efficient  and  inexpensive  tool  for  theoretical  studies  of 
molecular  properties. 
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COMPUTATIONAL  CHEMISTRY  ON  SUPERCOMPUTERS 
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This  paper  summarizes  the  introduction  to  the  panel  discussion  about  Supercomputing 
in  Chemistry.  It  presents  the  State  of  the  Art  in  Supercomputing,  the  new  wave 
constituted  by  massively  parallel  architectures,  the  new  trends  and  a  glossary  of 
technical  terms. 

Introduction  : 

To  introduce  the  problem  we  report  some  quotations. 

Klaus  Schulten,  University  of  Illinois,  Urbana  Champain. 

”  The  supercomputer  is  like  an  intelligent  microscope.  It  allows  you  to  look  into  the 
protein,  to  magnify  it,  and  to  slow  down  its  motion" 

Jacob  V.  MAIZEL,  National  Cancer  Institute,  Laboratory  of  Mathematical  Biology. 

"  Using  the  supercomputer  for  computational  simulations  of  biochemichal  systems 
can  both  eliminate  experimental  work  on  unpromising  drugs  candidates  and 
provide  insights  leading  to  new  approaches." 

David  A.  DIXON,  Du  Pont. 

"  The  combination  of  high-speed  computers,  theoretical  methods,  and  software 
now  makes  it  possible  to  perform  simulations  of  complex  systems  and  processes 
of  real  commercial  interest  to  the  chemical  industry." 


The  State  of  the  Art  in  Supercomputer  Architecture  : 

1.  A  short  review  of  supercomputer's  history  : 

1960  :  Univac  and  CDC  build  the  first  supercomputers.  IBM  followed  with  its 
360/91  model.  The  well  known  CDC  6600  is  the  best  example  of  this 
generation  of  supercomputers,  it  was  designed  by  Seymour  CRAY. 

1969:  CDC  builds  the  7600. 

1976  :  CRAY  1  :  the  first  vector  processing  supercomputer. 

1981  :  CDC  wants  to  compete  CRAY  with  the  CYBER  205. 

1 982  :  CRAY  XMP/2  :  Cray  boosts  the  vector  processing  with  parallelism. 


1985  :  CRAY  2  with  large  memory  size.  The  japanese  enter  the  competition  : 

FUJITSU  VP  200,  HITACHI  S810  and  NEC  SX1-SX2. 

1 986  :  IBM  builds  the  3090  with  Vector  Facility. 

1987  :  CDC  builds  the  ETA-10  serie. 

1988  :  CRAY  launches  the  YMP,  HITACHI  the  S820  and  ETA  disappears  ! 

1989  :  New  supercomputers  are  announced  : 

-  FUJITSU  VP2000, 

-  NEC  SX-X, 

-SUPERCOMPUTER  SSI, 

-  CRAY  3  :  new  technology  (GaAs),  16  processors  each  10C  times  faster 
than  the  CRAY  1. 


2.  Some  data  representing  the  situation  of  Supercomputers  at  the  present  time. 

The  size  of  the  market  and  distribution  of  supercomputers  makes  : 

400  Supercomputers  installed  in  the  world  :  Cray  56%,  Fujitsu  20%,  Nec  6%,  etc. 
In  Europe  :  58  Cray,  12  CDC,  7  Siemens-Fujitsu,  etc. 

Present  day  performances  : 

-  Cray  YMP-8  : 1.6  GFIops, 

-  Nec  SX-3  :  680  Mips  and  22  GFIops  (announced), 

-  Fujitsu  VP2000  : 4  GFIops. 

Which  Operating  System  ? 

UNIX®  becomes  the  standard  supercomputer  Operating  System  and  Local  Area 
Networks  combine  supercomputers  and  graphic  workstations.  Currently  there  is  a 
new  trend  :  hierarchical  computing. 

3.  Important  comment : 

With  more  and  more  power  available,  it  is  becoming  increasingly  critical  to  pay 
attention  to  efficient  programming  on  a  Supercomputer  which  needs  good 
knowledge  of : 

-  Supercomputer  architecture, 

-  Vectorization  techniques, 

-  Program  parallelization  with  loop  splitting. 

This  avenue  of  research  constitues  a  major  objective  for  the  companies 
manufacturing  Supercomputers. 


The  New  Wave  :  Massively  Parallel  Computers 

With  the  limitations  of  technology,  the  principal  direction  of  development  to  increase 
computation  speed  is  the  exploitation  of  parallel  architectures. 


®  UNIX  is  a  trade  mark  of  AT&T. 


In  the  early  '80s  a  new  concept  appeared,  namely  the  network  of  tightly  coupled 
processors. 

1 .  First,  recall  the  Michael  FLYNN  classification  : 


-  MIMD  :  Multiple  Instruction  Multiple  Data. 

-  SIMD  :  Single  Instruction  Multiple  Data. 

The  main  difference  between  these  two  architectures  is  the  number  of  Control  Units.  In 
a  MIMD  computer,  each  Processing  Unit  is  associated  with  a  Control  Unit  whereas  in  a 
SIMD  computer  there  is  only  one  single  Control  Unit  for  all  the  Processing  Units. 

MIMD 


SIMD 


P  U  :  Processing  Unit 
C  U  :  Control  Unit 

Diagram  1  :  Comparison  between  the  MIMD  and  the  SIMD  Architecture  model. 

2.  Granularity  :  In  such  architectures  the  choice  of  'size'  of  the  processing  element  has 
to  be  made. 

This  'size'  is  refered  to  as  granularity  : 

In  a  fine  grain  architecture,  each  processing  element  manipulates  a  small  set  of 
bits  with  elementary  operations  whereas  in  a  coarse  grain  architecture,  each 
processing  element  is  a  entirely  complete  processor  mostly  working  on  32  bit 
data. 
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;  Usually  fine  granularity  is  associated  with  a  SIMD  approach,  whereas  coarse 

I  granularity  and  MiMD  are  linked. 

|  3.  The  communication  network  plays  an  important  role  because  massively  parallel 

|  processing  needs  high  data  exchange  rate  between  the  processors.  The  network 

interconnects  the  processors  according  to  one  of  the  following  models  : 

}  (i)  -  Mesh  Array, 

(ii)  -  Perfect  Shuffle, 

j  (iii)  -  Hypercube, 

!  (iv)  -  Pyramid. 

5 

j 

f 

It  is  important  to  balance  the  cost  of  data  exchange  between  the  processors 
j  versus  the  gain  in  parallel  processing. 

;  N.B.  (ii)  and  (iii)  represent  identical  environnements  but  whereas  the  Perfect  Shuffle  is 

|  dynamic  the  Hypercube  is  static. 

I 

|  4.  The  application  fields  : 

The  numerous  fields  of  application  of  Supercomputing  in  number  crunching, 
j  Supercomputing  in  mathematics  and  Signal  processing  are  of  obvious  importance.  It  is 

j  also  useful  for  pattern  matching  in  data  bases  exploiting  the  associative  memory 

•  feature  and  for  image  processing  and  Ray  Tracing,  and  last,  but  not  least  in  Artificial 

i  Intelligence,  Expert  systems  and  Pattern  Recognition. 

;  5.  The  historical  steps  which  led  to  the  design  of  the  Connection  Machine  : 

!  Signal  Processing  (1965-1970)  -->  Parallel  Computing 

i  i 

i  Perfect  Shuffle  Algorithm  (Stone) 

!  i 

ILLIAC IV 

1 

HYPERCUBE 

t 

Connection  Machine 

i  t 

i  Tightly  Coupled  Processor  Network 

Artificial  Intelligence  (1975)  ~>  Semtntic  Networks  (Kowalski) 

A  short  Review  of  Massively  Parallel  Computers  : 

A  number  of  companies  each  offer  a  product  of  this  type  : 

The  Butterfly  Computer  for  Signal  Processing  made  by  BBN  Advanced  Computer  Inc. 

It  is  a  MIMD  architecture  with  256  coarse  grain  processors.  Each  processor  is  built 
up  of  a  Motorola  68020  with  Floating  Point  co-processor  and  4Mbyte  local 
memory.  Data  exchange  is  achieved  by  routing  processors  and  the  network  links 
the  processors  according  to  the  Perfect  Shuffle  model. 
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The  Connection  Machine  made  by  Thinking  Machines  Corporation  : 

This  is  a  SIMD  architecture  built  up  of  a  dimension  16  Hypercube  comprising 
65536  fine  grain  processors.  Each  processor  manipulates  only  1  bit  and  it  has 
4Kbit  local  memory.  The  peak  performance  is  12  GIPS  and  the  mean  is  2  GIPS. 

The  hardware  organization  follows  this  hierarchical  model  :  one  chip  contains  16 
processors  interconnected  by  bus,  one  board  contains  32  chips  and  the  machine 
contains  128  boards.  The  4096  chips  of  the  Hypercube  computer  are 
interconnected  by  a  routing  processor. 

Intel  Scientific  Computers  produce  the  IPSC. 

This  is  MIMD  architecture  constituted  by  a  dimension  7  Hypercube  with  128 
coarse  grain  processors.  Each  processor  being  either  a  80286  and  80287  for  the 
IPSC  or  80386  and  80387  for  the  IPSC/2.  Various  configurations  with  1, 4,  8  or  16 
Mbytes  local  memory  with  cache  are  available.  The  network  is  controled  by  a 
direct  message  router.  The  performances  for  the  IPSC  are  100  MIPS  and  8 
MFLOPS  whereas  for  the  IPSC/2  they  are  1  GIPS,  150  MFLOPS  and  1  GFLOP 
with  vector  facility. 

The  Transputer  by  INMOS  Limited  : 

The  Transputer  is  a  brick  to  build  up  MIMD  massively  parallel  computers.  The 
Transputer  T800  processor  is  coarse  grain  with  a  32  bit  RISC  architecture  with  4 
Kbytes  local  memory  and  4  Gbytes  external  memory.  The  network’s 
interconnection  uses  4  links  at  20  Mbit/s  to  communicate.  The  performances  are 
15  MIPS  RISC  and  2  MFLOPS.  An  original  feature  is  the  programming  language 
called  OCCAM  wich  is  a  parallel  language  based  on  the  CSP  model  (Hoare). 

Two  examples  of  existing  Transputer  based  Supercomputers  are  the  Computing 
Surface  from  Meiko  and  the  T-Node  from  Telmat.  Both  architectures  are  highly 
modular  with  4  to  1024  processors  and  up  to  4  Gbytes  memory.  In  addition,  the  T- 
Node  offers  a  dynamically  reconfigurable  network. 


A  glossary  of  technical  terms-appropriate  to  the  field  : 

-  PIPELINING  : 

Cutting  up  the  interpretation  algorithm  of  the  instruction  set  in  steps  which  are 
sequentially  executed. 

The  main  problems  are  the  flushing  of  the  pipeline  by  conditional  branches  and 
the  memory  becoming  the  bottleneck. 

-  MEMORY  MANAGEMENT : 

-  Interleaved  Banks : 

To  parallelize  access  to  contiguous  elements,  the  main  memory  is  split  into  2n 
independant  banks.  Two  contiguous  addresses  map  in  two  different  banks  and 
addresses  in  one  bank  progress  with  a  2n  step. 

-  Cache  Memory : 

Small  but  high  speed  memory  managed  according  to  a  LRU  policy.  Only  the 
Least  Recently  Used  information  is  stored. 
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-  VECTOR  PROCESSING : 

A  set  of  high  speed  vector  registers  is  available  to  the  processor.  The  processing 
of  these  vector  registers  is  carried  out  by  special  operators  in  a  pipelined  mode, 
the  main  problem  being  the  dependancies  between  the  elements  of  the  vectors 
which  cause  the  flushing  of  the  pipeline. 

-  AMDAHL'S  LAW  : 

This  law  predicts  the  performance  of  a  machine  that  has  2  separate  modes  of 
computing  (scalar  and  vector)  as  a  function  of  the  fraction  of  computation  devoted 
to  each  mode. 

-  RISC  :  Reduced  Instruction  Set  Computer 

The  main  idea  is  to  reduce  and  optimize  the  internal  architecture  of  the  processor 
by  optimizing  the  most  frequently  used  instruction  using  pipelining  and 
enhancement  of  the  set  of  internal  high  speed  registers  and  simplifying  access  to 
the  memory. 


NEW  TRENDS  TOWARDS  ARTIFICIAL  NEURAL  NETWORKS  : 

The  main  goal  is  to  mimic  human  thinking  for : 

(i)  -  Pattern  classification, 

(ii)  -  Speech  recognition, 

(iii)  -  Vision  processing. 

The  method  adopted  is  to  simulate  nerve  cells,  their  interconnections  and  pattern  of 
interaction. 

The  processing  element  receives  signals  from  neighbouring  elements  and, 
according  to  a  system  of  weighting  factors,  determines  whether  to  pass  the  signal 
farther  up  the  network  or  not. 

The  basic  neural  network  properties  are  massive  parallelism  and  adaptivity, 
(neurons  work  collectively),  fault  to^rance,  training  rather  than  programming,  (the 
weighting  factors  are  computed  through  learning)  and  data  processing  by 
spreading  activation  from  input  to  output. 

The  neural  network  works  as  a  Dynamic  System  controlled  by  a  transfer  function 
with  natural  functionning  like  an  associative  memory. 

The  learning  algorithm  uses  error  back  propagation  :  the  difference  between  the 
expected  and  the  obtained  result  is  computed  and  the  weights  are  corrected. 

The  domains  of  application  are  optimization  problems  with  function  minimization  and 
pattern  classification. 

State  of  the  Art  :  currently  most  work  consists  in  simulation  on  supercomputers, 
because  the  learning  algorithm  requires  high  speed  computers,  and  specialised 
hardware  design  based  on  analog  or  digital  integrated  circuits.  Today  we  can  fit  100 
neurons  on  1  chip  and  in  50  years  perhaps  computers  withlO^  neurons. 
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HIGH  SPEED  GaAs  TECHNOLOGY  : 

The  main  properties  of  this  technology  are  high  electron  mobility  with  high  peak 
electron  velocity  and  improved  radiation  tolerance  over  a  wide  range  of  operating 
temperature.  GaAs  reduces  power  consumption  and  improves  speed  :  in  the  fact  GaAs 
is  5  times  faster  then  Si  ECL  technology. 

State  of  the  art : 

Today  GaAs  VLSI  integrated  circuits  contain  for  103  to  105  gates  whereas  silicon  VLSI 
integrated  circuits  contain  107  gates.  GaAs  will  never  totally  replace  Silicon,  it  will  be 
used  in  selected  aerospace,  defense  and  supercomputer  applications. 


Conclusion  : 

Supercomputers  are  now  30  years  old  ...  I  think  we  are  in  the  middle-ages  ! 
Acknowledgements  : 

I  am  grateful  to  D.  TUSERA  of  the  INRIA  for  his  help  in  compiling  the  bibliography.  The 
text  in  English  was  reviewed  by  Ph.  HOGGAN. 

Bibliography  : 

V.  Milutinovic  :  GaAs  Microprocessor  Technology.  Computer  October  1986 

L.  E.  Larson  et  al.  :  GaAs  High-Speed  Digital  1C  Technology  :  an  overview.  Computer 
October  1 986 

H.  S.  Stone  :  Parallel  Processing  with  the  Perfect  Shuffle.  Transactions  on  computers, 
vol  C-20,  n°2,  February  1971 

W.  D.  Hillis  :  The  Connection  Machine.  MIT  Press  1985 

S.  E.  Fahlman  et  al.  :  Connectionnist  Architectures  for  Artificial  Intelligence.  Computer 
January  1987 

O.  Lubeck  et  al. :  A  Benchmark  Comparison  of  Three  Supercomputers  :  Fujitsu  VP-200, 
Hitachi  S810/20  and  Cray  X-MP/2.  Computer  December  1985 


784 


DISCUSSION 


The  discussion  started  with  a  shod  expose  by  Messrs.  Charles  Henriet  (Cray  Research 
France),  Armand  Herscovici  (IBM  France),  Jean-Claude  Reynaud  (Convex  S.A.)  and  Daniel 
Urbain  (FPS  Computing).  The  following  salient  points  have  been  recorded. 


Ch.  Henriet 

Exciting  new  developments  of  theoretical  and  computational  methods,  combined  with  major 
advances  in  computer  hardware  and  software  will  continue  to  make  computational  chemistry 
one  of  the  most  fascinating  fields  of  scientific  computing.  Progress  in  computational 
chemistry  continues  to  increase  our  ability  to  tackle  more  complex  problems  at  a  higher  level 
of  accuracy  than  in  the  past.  The  most  important  opportunities  for  computational  chemistry 
appear  in  molecular  biology,  including  pharmaceutical,  agrochemical,  and  biotechnology 
research,  and  mateiials  sciences,  especially  polymer  sciences,  catalysis,  and  advanced 
electronic,  optical,  and  structural  materials.  We  have  reached  a  threshold  where,  perhaps  for 
the  first  time  in  chemical  reseatch,  large-scale  numerical  simulations  are  becoming  an  impor¬ 
tant  element  in  industrial  research.  Given  this  emerging  economic  relevance  of  computational 
chemistry  and  the  continued  progress  in  hardware  and  software,  the  future  of  computational 
chemistry  looks  extremely  ptomising.  Three  major  technological  factors  will  influence  the 
future  of  the  field  :  new  theoretical/computational  methods,  new  hardware,  and  new  software 
tools. 

Future  theoretical  developments 

The  most  fundamental  factor  is  the  development  of  new  computational  theories  and 
approaches.  During  the  past  decade,  single-reference  Hartree-Fock  methods  with  the  ability 
to  calculate  first  and  second  energy  derivatives  analytically  have  become  a  standard  tool  in 
theoretical  organic  chemistry  and  form  the  backbone  in  the  development  of  force-field 
parameters  for  molecular  dynamics  studies  of  biomacromolecules  and  polymers.  In 
traditional  implementations  of  al  imho  methods  integrals  were  stored  externally.  In  such  an 
approach,  the  N'  scaling  problem  constitutes  a  serious  roadblock  for  the  treatment  of  systems 
with  more  than  about  200  basis  functions.  Direct  SCF  methods,  now  available  in  programs 
such  as  Gaussian-88,  are  circumventing  this  problem  and  significantly  larger  systems  can  be 
tackled.  As  it  turns  out,  for  systems  with  more  than  several  hundred  basis  functions,  the 
scaling  reduces  from  N'  to  N!  and  even  below.  Hence,  increased  computer  power  and  the 
implementation  of  parallel  algorithms  open  up  the  possibility  to  treat  much  larger  systems  on 
the  ab-initio  level  than  previously  thought  possible. 

We  can  also  expect  a  continued  progiess  in  high-precision  correlated  methods.  In  fact,  for 
small  systems,  it  will  become  possible  to  carry  out  full  Cl  calculations  more  and  more  readily 
thus  establishing  benchmarks  on  which  various  basis  set  truncations  and  limitations  in  the  Cl 
expansion  can  be  systematically  evaluated.  As  correlated  methods  are  applied  to  larger  and 
larger  systems,  it  can  be  expected  that  localization  of  correlation  could  keep  the  high  scaling 
factor  of  correlated  methods  within  reasonable  bounds,  similar  to  what  we  are  observing  for 
the  single-reference  methods. 

Two  newer  developments  in  the  treatment  of  the  many-body  problem  can  be  anticipated  to 
bear  fruit  in  computational  chemistry.  One  is  density  functional  theory,  where  programs  are 
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now  being  developed  that  contain  analytic  first  derivatives.  This  capability  will  allow  the 
systematic  study  of  structural  properties  of  a  wide  variety  of  molecules,  clusters,  solids,  and 
surfaces  including  transition  metals,  rare  earths,  and  actinides  thus  expanding  the  scope  of 
systems  that  can  be  treated  from  first  principles.  Today,  the  most  common  inplementation  of 
density  functional  theory  is  the  so-called  focal  densily  approximation  (LDA),  yet  several  ways 
are  currently  being  explored  to  go  beyocd  this  approximation.  A  close  collaboration  between 
high-precision  Cl  or  multireference  practitioners  and  developers  o(  density  functional  theory 
should  prove  extremely  fruitful  as  we  strive  to  develop  more  accurate  and  efficient  methods 
for  the  challenging  many-body  problem  of  correlated  electron  systems.  Other  interesting  new 
theoretical  development  are  the  quantum  Monte-Carlo  methods.  Although  slow  in  coming, 
there  are  promising  signs  that  by  using  pseudo-Hamiltonians  these  methods  could  be 
successfully  applied  to  larger  systems  and  avoiding  the  challenging  computational  problems 
in  this  approach. 

The  combination  of  electronic  structure  theory  with  molecular  dynamics  can  be  anticipated  as 
one  of  the  most  exciting  future  developments  in  computational  chemistry  as  this  aims  directly 
at  (he  simulation  of  dynamical  phenomena  of  the  making  and  breaking  of  chemical  bonds. 
One  can  envision  a  combination  of  quantum  mechanics  and  force  field  methods  where  active 
sites  of  an  enzyme  or  reactive  centers  of  a  catalyst  are  treated  quantum  mechanically 
whereas  the  surroundings  are  desciibed  by  a  quasi-classical  force  field.  Another,  more 
profound  combination  between  quantum  mechanics  and  molecular  dynamics  is  possible  in 
the  Car-Parrinello  scheme  in  which  the  motion  of  atoms  and  changes  in  the  wave-functions 
are  conceptually  treated  on  the  same  footing. 

Future  hardware  -  network  supercomputing  in  heterogeneous  environments 

During  the  next  decade,  progress  in  computer  hardware  will  be  dominated  by  parallelism. 
While  the  performance  of  single  processors  will  be  pushed  by  further  reducing  the  clock- 
cycle  perhaps  to  1  ns  by  the  late  1990s,  major  performance  enhancements  will  come  from 
parallel  processing  and  from  large-memory  architectures.  Although  certain  tasks  in 
computational  chemistry  are  intrinsically  peifectly  parallel  such  as  the  evaluation  of  two- 
electron  integrals  in  ab-inilio  methods  or  the  pair-wise  energy  evaluations  in  molecular 
dynamics,  it  would  be  a  mistake  to  assume  that  massively  parallel  architectures  will  be  the 
panacea  of  computing.  Theie  are  steps  in  all  large-scale  simulations  that  are  best  performed 
in  a  shared-memory  architecture.  Perhaps  the  proper  balance  between  powerful  scalar  and 
vector,  shared-memory  processors  at  a  moderate  level  of  parallelism  connected  to  highly 
parallel,  distributed  memory  processois  in  the  netwoik  will  provide  the  best  solution  for 
large-scale  numerical  tasks  in  chemistry. 

We  believe  that  the  intimate  connection  between  desk-top  graphics  workstations  with 
supercomputers  and  large  data-base  systems  has  the  potential  for  an  extraordinarily  produc¬ 
tive  environment.  The  key  is  connectivity  between  heterogeneous  hardware  which  will  be 
enabled  by  software  standards.  It  appears  that  UNIX  1  is  becoming  such  a  desired  standard, 
perhaps  initially  with  TCP/IP  as  netwoik  protocol. 

Since  the  bandwidth  of  networks  can  be  expected  to  remain  the  critical  bottleneck  in  a 
network-supercomputmg  environment,  sufficient  local  and  desk-top  computer  power  will  be 
essential  for  adequate  load  balancing  and  data  compression  over  the  network. 

Future  software  languages  and  tools 

During  the  past  two  decades,  FORTRAN  was  the  standard  for  scientific  programming.  There  is 
an  incredible  wealth  of  chemistry  soflwaie  written  in  this  langage,  with  an  investment  in  man- 
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time  that  is  probably  measured  in  thousands  of  man-years.  Concurrently  with  the  software 
developments,  major  progress  has  been  made  in  FORTRAN  compiler  technology  and  code 
optimization  techniques  which  also  represenl  an  enormous  economic  investment.  As  we 
move  to  new  software  environments,  it  will  be  wise  to  design  them  in  such  a  way  that  the 
earlier  programming  work  can  be  used.  For  example,  as  more  and  more  software  is  being 
written  in  C,  this  compatibility  can  be  achieved  since  C  and  FORTRAN  coexist  quite  well.  In 
fact,  the  trend  seems  to  be  that  the  computationally  intensive  parts  will  be  written  in 
FORTRAN  whereas  the  program-system  interfaces  such  as  connections  to  graphics 
workstations  and  the  handling  of  data-lransfer  across  networks  will  be  expressed  in  C  using 
the  obvious  relationships  between  C  and  UNIX. 

Today,  parallelism  of  FORTRAN  codes  is  usually  handled  via  compiler  directives  that  are 
inserted  automatically  by  pre-processors  or  manually  by  (he  programmer.  We  will  have  to 
wait  and  see  which  standards  will  emerge  for  expressing  parallelism  within  existing  computer 
languages  or  perhaps  within  new  languages.  Perhaps  the  key  issue  here  is  not  so  much  if  we 
need  yet  another  language  on  the  same  level  as  FORTRAN,  C,  or  PASCAL,  but  rather  the 
creation  of  higher  levels  of  language  for  expressing  mathematical  and  chemical  concepts. 

An  important  trend  in  computational  chemistry  will  be  the  embedding  of  numerically  intensive 
tasks  in  an  environment  that  allows  the  convenient  manipulation  of  graphical,  symbolic  and 
logical  information.  Such  an  environment  will  enable  the  research  chemist  to  combine  in  a 
productive  fashion  information  from  experience  and  experiment  with  computed  data  from 
simulations.  Experl-systems  may  assist  the  non-expert  to  carry  out  meaningful  calculations, 
guiding  in  the  choice  of  the  computational  approach,  and  assisting  in  the  interpretation  and 
assessment  of  results.  Nevertheless,  these  systems  will  not  replace  the  need  for  more 
education  of  practicing  and  future  chemists.  In  fact,  education  is  clearly  the  most  critical 
component  of  all  to  ensure  a  prosperous  future  of  computational  chemistry.  A  wider  group  of 
chemists  need  to  be  taught  how  to  use  all  the  powerful  lools  of  computational  chemistry. 

In  conclusion,  we  are  excited  about  many  future  opportunities  of  computational  chemistry  as 
applied  to  the  fascinating  areas  of  the  life-sciences  as  well  as  materials  sciences.  Hopefully, 
continued  advances  in  computational  chemistry  will  make  this  branch  of  science  a  widely 
used  discipline  and  true  partner  to  the  experimental  efforts  as  we  strive  to  solve  the  many 
challenging  problems  such  as  cures  for  cancer  and  AIDS,  the  development  of  novel 
electronic,  optical,  magnetic,  and  structural  materials,  and  the  development  of  safer  chemical 
processes  and  compounds  that  are  environmentally  sound. 


A.  Herscovici 

Numerical  intensive  computing  (NIC)  spread  widely  during  the  1970's,  when  the  petroleum 
problem  generated  a  worldwide  economic  crisis,  with  an  emeiging  tough  competitive 
environment.  To  meet  these  new  conditions,  companies  had  lo  implement  additive 
components  in  their  strategies. 

One  of  them  consisted  in  organizing  their  R&D  departments  in  order  to  enable  them  to  issue 
quicker  more  and  more  optimized  new  products,  to  face  competition  efficiently.  This  meant 
shorter  development  cycles,  and  more  discrete  simulations,  both  items  driving  lo  more  and 
more  powerful  computers.  We  can  consider  this  to  be  the  very  origin  of  the  strong  growth  of 
the  NIC  market,  estimated  above  39%  per  year  worlwide. 

Research  organization  fully  parlicipaled  in  this  scheme,  as  most  advanced  applications,  which 
feed  supercomputers,  are  conceived  within  Ihe  research  world. 

Calculations  generated  by  this  new  environment  were  huge,  compared  with  the  computing 
power  available.  This  resulted  in  enormous  CPU  time,  ending  in  (he  main  concern  ;  increase 
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computing  power  in  order  to  shorten  the  NIC  process.  Most  people  were  indirectly 
considering  that  increasing  computer  speed,  and  shortening  the  development  cycle,  were  two 
similar  concepts. 

Emphasis  was  given  to  supercomputers,  and  vector  speed  :  the  bigger  the  vector  peak  rate, 
the  more  efficient  the  supercomputer  from  a  development  cycle  point  of  view. 

Very  quicky,  supercomputer  users  (scientists,  engineers)  realised  that  applications  could  be 
vectorised  only  up  to  a  certain  amount,  and  that  the  CPU  performance  was  a  combination  of 
scalar  and  vector  speed,  scalar  speed  being  a  major  contributor.  The  concept  of  the  balanced 
supercomputer  was  born. 

Supercomputer  vendors  provided  more  and  more  powerful  machines,  as  technological 
advances  were  coming  fast.  Users  immediately  took  advantage  of  the  available  computing 
power,  generating  new  CPU  bound  calculations,  and  models  extraordinarily  increased  in  size. 

The  result  was  dramatic  and  rather  unexpected  :  the  nature  of  NIC  applications  was 
definitively  changed. 

To  analyse  this  very  deep  change,  let  s  go  back  20  years  ago,  when  NIC  applications  could  be 
analysed  in  a  few  words  by  the  following  spectrum  :  far  above  90%  in  CPU,  far  below  10%  in 
I/O.  As  a  consequence,  computers  for  NIC  had  to  emphasize  CPU  speed  only. 

Nowadays,  this  scheme  is  definitively  obsolete  .  NIC  applications  are  using  large  quantities  of 
data,  and  producing  a  huge  amount  of  results  :  pre  and  post  processing  are  taking  on 
tremendous  importance  ,  storage  size  problems  appear  ,  I/O  management  becomes  a  major 
factor  for  performance  ;  true  cooperative  processing  is  mandatory. 

Moreover,  organisation  problems  appear,  which  were  unknown  up  to  now  .  designing  a  finite 
element  mesh  may  take  several  months  (for  instance  in  automotive  industry  in  car  body 
simulation).  Companies  can  no  longer  afford  to  throw  away  this  mesh  after  the  calculation  is 
done  .  there  must  be  some  way  to  save  it  in  a  technical  data  base,  for  later  re-use  in  another 
version  of  the  body.  This  applies  also  to  the  enoimous  amount  of  results. 

In  addition,  similar  runs  by  different  persons  must  be  avoided.  This  is  not  an  easy  matter  to 
deal  with,  as  many  NIC  organizations  have  not  yet  been  able  to  take  into  account  the 
diamatic  change  in  NIC  applications,  due  to  the  quickness  of  the  phenomenon  .  most  are  still 
working  within  an  organization  fitting  the  former  spectrum,  ending  in  a  partial  loss  of  control  . 
very  often,  moie  than  50%  of  the  runs  could  have  been  avoided.  A  DP  organization  taking 
advantage  of  the  technical  data  base  has  to  be  implemented,  ensuring  full  control. 

Another  major  issue  is  the  fact  that  supercomputers  aie  large  expensive  systems,  so  that 
management  wishes  the  machine  be  used  simultaneously  by  many  people.  This  drives  to  a 
new  requirement,  rather  seldom  taken  into  account  in  NIC  .  optimize  the  use  of  the  system, 
which  is  an  operating  system  problem.  The  performance  of  the  computer  is  no  longei  to  be 
measured  on  a  kernel,  or  a  single  application,  but  on  a  set  of  representative  programs  run 
simultaneously  .  the  machine  with  the  best  peiformance  will  produce  the  results  for  ALL  the 
programmes  in  the  shortest  amount  of  time. 

To-day,  shortening  the  research  or  development  cycle  implies  taking  into  account  all  these 
new  parameters  .  NIC  is  becoming  mature  when  it  really  meets  these  requirements.  This  is 
far  wider  than  concentrating  on  CPU  peiformance. 
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J.C.  Reynaud 

This  speaker  emphasized  the  prosperous  future  of  vectorized  superminicomputers  which 
have  a  very  attractive  performance/cost  ratio  due  to  the  fact  that  they  are  air-cooled.  He  also 
insisted  on  the  important  contribution  to  homogeneity  introduced  by  the  use  of  the  UNIX 
operating  system,  that  brings  a  new  freedom  to  the  users  for  choosing  the  best  hardware 
platform  without  being  tied  up  by  a  proprietary  operating  system.  He  concluded  in  saying  : 
«UNIX  is  a  challenge  for  the  computer  manufacturers.  Besides,  to  maintain  our  leadership  in 
air-cooled  supercomputers  we  develop  a  products  strategy  based  on  advanced  technology  as 
for  instance  CMOS  VLSI,  ECL  and  GaAs  on  future  generation  in  order  to  multiply  by  10  the 
machine  performances  every  3  years.” 


D.  Urbain 

M.  Urbain  agreed  with  the  importance  of  the  UNIX  operating  system  and  of  the  vectorized 
supermini,  with  a  special  attention  paid  to  the  RISC  architecture  and  the  role  of  a  good 
graphics  display. 


